R script can not work in MapReduce 2.0 with RHadoop

65 views
Skip to first unread message

Kent Jiang

unread,
Jun 10, 2014, 1:15:09 AM6/10/14
to rha...@googlegroups.com, Kent....@perficientgdc.com.cn
Hi Antonio,

I experienced the unexpected MapReduce failure issue and can not figure out what's the problem. Most likely it is due to R comparability with MapReduce 2.0.

1)Tool Version
HDFS 2.4.0.2.1
YARN + MapReduce2 2.4.0.2.1

Hadoop 2.0 is deployed in the clustered nodes.

RStudio Version 0.98.507
rmr2: rmr-3.1.1 
rhdfs: rhdfs-1.0.8

2)Have set the environment variable in RStudio:
Sys.setenv(HADOOP_CMD = "/usr/bin/hadoop")
Sys.setenv(HADOOP_STREAMING = "/usr/lib/hadoop-mapreduce/hadoop-streaming.jar")  

3)I executed the very simple R program which is from the wiki guide. But got the failure. I attached the syslog from MapReduce job history as there is no stderr.

small.ints = to.dfs(1:1000)
from.dfs(small.ints)
mapreduce(
input = "", 
map = function(k, v) cbind(v, v^2))

4)I executed following command but also got the same error.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input hdfs://HadoopServer2:8020/user/hdfs/hdp/in1/simpleRInput.txt -output hdfs://HadoopServer2:8020/user/hdfs/hdp/out/ -mapper simpleR.R -file /home/hdfs/simpleR.R

5)While the following command which is with Unix shell script can work well.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input hdfs://HadoopServer2:8020/bigdatalab/input -output hdfs://HadoopServer2:8020/bigdatalab/streamoutput -mapper /bin/cat -reducer /usr/bin/wc

I also wrote a Python script to print out HelloWorld and call it in MapReduce, it also worked.

Can you advise there is any bug on the RHadoop package or anything I have built wrong?
syslog.txt

Antonio Piccolboni

unread,
Jun 10, 2014, 2:20:30 AM6/10/14
to RHadoop Google Group

The R program you allegedly took from the wiki has an empty string as input. Please explain how that is expected to work.

Antonio

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kent Jiang

unread,
Jun 11, 2014, 2:06:42 AM6/11/14
to rha...@googlegroups.com, ant...@piccolboni.info
Sorry, I am posting wrong script here. Actually I tested with below script which is directly from Wiki. I copied it in the RStudio and select All to run. Then I saw the job failed when doing Map and Reduce. Actually I tried the another program where Mapper and Reducer is empty (do nothing), will get the same error.

small.ints = to.dfs(1:1000)
mapreduce(
  input = small.ints, 
  map = function(k, v) cbind(v, v^2))


Antonio Piccolboni

unread,
Jun 11, 2014, 4:11:12 PM6/11/14
to Kent Jiang, RHadoop Google Group
Since it works for me, the syslog just says that the R process failed and there is no stderr, unfortunately I am unable to help. My best suggestion is to figure out why stderr is not being collected, or why R would fail without any message. In one other instance where I couldn't reproduce the error and error messages were not informative, the user set up a test system for me to debug the issue. If this is an option available to you, I am open to take a look at it. But first I'd try to figure out what happened to stderr. No matter how early the failure, there's always at least a line in stderr. Even if you had forgotten to install R on the nodes, stderr would contain something like

can't execute Rscript

or something of that sort. Empty, that's a first.

Antonio

Kent Jiang

unread,
Jun 15, 2014, 9:01:22 AM6/15/14
to rha...@googlegroups.com, jame...@gmail.com, ant...@piccolboni.info
Antonio, the issue was resolved by myself. The cause is that I installed Hadoop required packages with non root user: hdfs. Now I installed the R package in /usr/lib64/R and it works.

Antonio Piccolboni

unread,
Jun 16, 2014, 2:02:22 PM6/16/14
to rha...@googlegroups.com, jame...@gmail.com, ant...@piccolboni.info
Great, please the next time you have a problem please refrain from making comments about compatibility with this or that version of Hadoop in the absence of any evidence. It doesn't help the project.



Antonio
Reply all
Reply to author
Forward
0 new messages