Hi,
I am struggling with getting the MultipleOutputFiles example to work. I am running an Hadoop cluster with CDH4 (Hadoop 2.0.0) and a working dumbo installation.
Running splitwordcount.py locally returns the results in a single file, just as explained in the tutorial.
If I want to try it on hadoop, the map phase runs smoothly, however during reducing I get the following error for all reducer attempts and the job fails:
java.io.IOException: subprocess still running
R/W/S=82260/24/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=mapred
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |9812|
Date: Wed Jan 23 15:48:35 CET 2013
Broken pipe
at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:131)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:492)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)