I'm trying this feature. I set an output folder and save it as text to see results from Hadoop web interface and its not working for me.
If I run the first code as is, I get odd part files empty and the other ones are not sorted at all:
(I've collected reduceBucket as in your second example)
# Content of "/tmp/sort_test/part-r-00001" 4.0 0.33340575088476676 1.0
# Content of "/tmp/sort_test/part-r-00003"
3.0 0.2227320245765758 1.0
# Content of "/tmp/sort_test/part-r-00009"
1.0 2.7771179046576206E-4 1.0
My relevant sessionInfo() for version assert:
other attached packages:
[1] codetools_0.2-8 rJava_0.9-4 Rhipe_0.73.1-2
And Hadoop version:
# hadoop version
Hadoop 2.0.0-cdh4.3.0
Running your second example works better but, if I set 2 reducers less as you said:
# /tmp/sort_test/part-r-00000 is empty # Content of "/tmp/sort_test/part-r-00001"
1.0 2.7771179046576206E-4 1.0
# Content of "/tmp/sort_test/part-r-00047"
47.0 0.9392276984544847 1.0
If I run with same reducers as intervals (it is 50 reducers):
# Content of "/tmp/sort_test/part-r-00000" 48.0 0.9591911839115821 1.0
# Content of "/tmp/sort_test/part-r-00001"
1.0 2.7771179046576206E-4 1.0
# Content of "/tmp/sort_test/part-r-00049"
49.0 0.9797495281761285 1.0
So it seems that numeric is not working for me maybe a version issue.
For integer version it seems that it should work just subtracting 1 to "whichReducer" but results on an IndexOutOfBoundException.
I paste you the error trace if it helps to you:
2013-07-26 10:47:09,949 INFO org.godhuli.rhipe.RHMRHelper: Mapper:Started Output Thread
2013-07-26 10:47:10,121 WARN org.godhuli.rhipe.RHMRHelper: Mapper:java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:3164)
at org.godhuli.rhipe.REXPProtos$REXP.getIntValue(REXPProtos.java:306)
at org.godhuli.rhipe.RHInteger.readFields(RHInteger.java:54)
at org.godhuli.rhipe.RHMRHelper$MROutputThread.readRecord(RHMRHelper.java:314)
at org.godhuli.rhipe.RHMRHelper$MROutputThread.run(RHMRHelper.java:337)
2013-07-26 10:47:20,124 INFO org.godhuli.rhipe.RHMRHelper: Mapper:MRErrorThread done
2013-07-26 10:47:20,126 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-07-26 10:47:20,129 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: MROutput/MRErrThread failed:java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:3164)
at org.godhuli.rhipe.REXPProtos$REXP.getIntValue(REXPProtos.java:306)
at org.godhuli.rhipe.RHInteger.readFields(RHInteger.java:54)
at org.godhuli.rhipe.RHMRHelper$MROutputThread.readRecord(RHMRHelper.java:314)
at org.godhuli.rhipe.RHMRHelper$MROutputThread.run(RHMRHelper.java:337)
2013-07-26 10:47:20,130 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: MROutput/MRErrThread failed:java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:3164)
at org.godhuli.rhipe.REXPProtos$REXP.getIntValue(REXPProtos.java:306)
at org.godhuli.rhipe.RHInteger.readFields(RHInteger.java:54)
at org.godhuli.rhipe.RHMRHelper$MROutputThread.readRecord(RHMRHelper.java:314)
at org.godhuli.rhipe.RHMRHelper$MROutputThread.run(RHMRHelper.java:337)
at org.godhuli.rhipe.RHMRHelper.checkOuterrThreadsThrowable(RHMRHelper.java:244)
at org.godhuli.rhipe.RHMRMapper.run(RHMRMapper.java:68)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)