Hi Antonio,
I experienced the unexpected MapReduce failure issue and can not figure out what's the problem. Most likely it is due to R comparability with MapReduce 2.0.
1)Tool Version
HDFS 2.4.0.2.1
YARN + MapReduce2 2.4.0.2.1
Hadoop 2.0 is deployed in the clustered nodes.
RStudio Version 0.98.507
rmr2: rmr-3.1.1
rhdfs: rhdfs-1.0.8
2)Have set the environment variable in RStudio:
Sys.setenv(HADOOP_CMD = "/usr/bin/hadoop")
Sys.setenv(HADOOP_STREAMING = "/usr/lib/hadoop-mapreduce/hadoop-streaming.jar")
3)I executed the very simple R program which is from the wiki guide. But got the failure. I attached the syslog from MapReduce job history as there is no stderr.
small.ints = to.dfs(1:1000)
from.dfs(small.ints)
mapreduce(
input = "",
map = function(k, v) cbind(v, v^2))
4)I executed following command but also got the same error.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input hdfs://HadoopServer2:8020/user/hdfs/hdp/in1/simpleRInput.txt -output hdfs://HadoopServer2:8020/user/hdfs/hdp/out/ -mapper simpleR.R -file /home/hdfs/simpleR.R
5)While the following command which is with Unix shell script can work well.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input hdfs://HadoopServer2:8020/bigdatalab/input -output hdfs://HadoopServer2:8020/bigdatalab/streamoutput -mapper /bin/cat -reducer /usr/bin/wc
I also wrote a Python script to print out HelloWorld and call it in MapReduce, it also worked.
Can you advise there is any bug on the RHadoop package or anything I have built wrong?