Re: Error running rmr2 on a four node hadoop cluster (MapR M5)

Antonio Piccolboni

unread,

Jan 30, 2013, 12:44:59 PM1/30/13

to rha...@googlegroups.com

Hi

this is the most interesting line for me

201301291535_0013/attempt_201301291535_0013_m_000001_0/work/Rscript does not exist.

Rscript is part of an R installation. After a typical installation, it should be on the Unix shell PATH, that is it should be executable by just typing Rscript at the prompt. rmr expects that. You could run a streaming job like the following
hadoop jar <path-to-streaming-jar> -input <some-small-input> -mapper env -output <some-output>
or some such and inspect the output to find what the path variable is set to. Also a
type -a Rscript
on the node where the failure occured would be of help. Remember that hadoop jobs are run as a hadoop-specific user, not as "yourself" meaning your accout. The fact that Rscript is on the path when you ssh to a node is neither necessary nor sufficient.

Antonio

On Wednesday, January 30, 2013 9:30:20 AM UTC-8, Roberto Rösler wrote:

I tried to install rmr2 on a four node cluster running MapR Hadoop M5. The OS is Ubuntu 12.0.4.1 with R 2.15.2. The installation of hadoop was successful - all jobs are running fine. I proved this with the following example:

hadoop fs -mkdir /test/rmr2setup/wc-in hadoop fs -put /opt/mapr/NOTICE.txt /test/rmr2setup/wc-in hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount /test/rmr2setup/wc-in /test/rmr2setup/wc-out hadoop fs -rmr /test/rmr2setup

Next, to install rmr2, I followed the instructions given in http://www.mapr.com/blog/harness-the-power-of-r-and-hadoop very closely. The installation of rhdfs was successfully and it is possible to reach hdfs from inside R. Also the installation of the rmr2 package was fine (Installed R and the packages on all nodes). But when I try to start a mapreduce job from inside R it gives my an error:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader). log4j:WARN Please initialize the log4j system properly. java.io.FileNotFoundException: File /tmp/mapr-hadoop/mapred/local/taskTracker/mapr/jobcache/job_201301291535_0013/attempt_201301291535_0013_m_000001_0/work/Rscript does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:395) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:257) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:738) at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:713) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:185) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) 2013-01-30 17:57:09,7995 ERROR Client fs/client/fileclient/cc/writebuf.cc:272 Thread: 139822073804544 FlushWrite failed: File part-00001, error: Stale File handle(116), pfid 2049.370.67412, off 0, fid 2049.370.67412

The environment variable are set properly in /etc/environment:

HADOOP_CMD="/opt/mapr/hadoop/hadoop-0.20.2/bin/hadoop" HADOOP_STREAMING="/opt/mapr/hadoop/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-dev-streaming.jar" LD_LIBRARY_PATH="/opt/mapr/lib:$LD_LIBRARY_PATH" HADOOP_CONF="/opt/mapr/hadoop/hadoop-0.20.2/conf"

Any ideas where this problem comes from?

Thanks in advance

Roberto

Roberto Rösler

unread,

Jan 30, 2013, 2:04:31 PM1/30/13

to rha...@googlegroups.com

Hi Antonio,

thanks for your fast reply. I figured it out, that the problem is not related to rmr2. Even the simple streaming job throws an error:

log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
log4j:WARN Please initialize the log4j system properly.

java.io.FileNotFoundException: File /tmp/mapr-hadoop/mapred/local/taskTracker/root/jobcache/job_201301291535_0024/attempt_201301291535_0024_m_000000_0/work/env does not exist.

I think, I will call the experts from MapR.

Thanks

David Boyd

unread,

Jul 29, 2013, 4:42:44 PM7/29/13

to rha...@googlegroups.com

Roberto:

Did you get an answer for this? I am having the exact same problem.

David Boyd

unread,

Jul 30, 2013, 10:57:58 AM7/30/13

to rha...@googlegroups.com

All:

Well I found my own answer and thought I would post for the community.

It turns out that at least for MaprR (and who knows what other Haddoop implementations) the

environment for the mapper tasks does not inherit the normal user (what ever user your hadoop jobs run as)

environment (e.g. /etc/profile.d/*, /etc/profile, etc. are not sourced). I had to explicitly

set the PATH variable in mapred-site.xml via mapred.map.child.env and mapred.reduce.child.env

on each node. There are probably other things that need set.

I would be nice if there was a way to get the HADOOP job user to actually inherit

the normal profiles.

Ravi

unread,

Jul 30, 2013, 12:08:27 PM7/30/13

to rha...@googlegroups.com

David,

Could you please tell what values did you set for mapred.map.child.env and mapred.reduce.child.env in mapred-site.xml?

Thanks

Ravi

Antonio Piccolboni

unread,

Jul 30, 2013, 12:28:07 PM7/30/13

to RHadoop Google Group

Interesting information, thanks. It begs the question why the the mapr-provided installation guide for RHadoop doesn't mention this problem and its solution. Maybe this is a good place to share a trick I use to check the environment as set up for a streaming map or reduce script, including those created by rmr2 behind the scenes. I run a job like the following

$HADOOP_CMD jar $HADOOP_STREAMING -input <single line text file> -output <some output path> -mapper env

That way you can check what the PATH variable and all the other variables are set to in the environment set up by streaming.

Antonio

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David Boyd

unread,

Aug 21, 2013, 11:44:56 AM8/21/13

to rha...@googlegroups.com, ant...@piccolboni.info

Actually it was your tip on another email thread that helped me find the answer.

David Boyd

unread,

Aug 21, 2013, 11:45:21 AM8/21/13

to rha...@googlegroups.com

Ravi:

In my case I added the revolution R stuff to the path as follows:

<name>mapred.map.child.env</name>

<description>User added environment variables for the task tracker child

processes. Example :

1) A=foo This will set the env variable A to foo

2) B=$B:c This is inherit tasktracker's B env variable.

</description>

</property>

<name>mapred.reduce.child.env</name>

</property>

Reply all

Reply to author

Forward