RHadoop installation - on every node including name nodes?

186 views
Skip to first unread message

11580

unread,
Jun 3, 2014, 2:05:54 AM6/3/14
to rha...@googlegroups.com
I have installed R and RHadoop on all of my data nodes, I am wondering if I should install them on name nodes(primary and 2nd ) as well?


Antonio Piccolboni

unread,
Jun 3, 2014, 8:56:16 AM6/3/14
to RHadoop Google Group

Any node that can run a task and one computer to use as master (could be one of the nodes but probably better to be separate, because of load balancing reasons).

On Jun 3, 2014 8:07 AM, "11580" <feiy...@gmail.com> wrote:
I have installed R and RHadoop on all of my data nodes, I am wondering if I should install them on name nodes(primary and 2nd ) as well?


--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

11580

unread,
Jun 3, 2014, 10:49:49 AM6/3/14
to rha...@googlegroups.com, ant...@piccolboni.info
Thanks for the quick reply, Antonio, just to clarify,  you mean I DO need to install RHadoop on all nodes including name nodes and its backup(secondary name node)? 

basically issue I am running into is:
1. I am trying to do time series analysis using RHadoop, so far, I only installed RHadoop on data nodes, and when I ran my time series code with mapper and reducer, typically I started from RStudio server from one of the data nodes, and so far, for my MONTHLY time series data, it actually worked despite the fact that it always has following error in the log even it eventually finished the job successfully(it took a few failed attempts before final attempt was successful for example)

java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory 
 
2. now after monthly time series is done and I am moving to weekly prediction which has a lot more data than monthly, I am running into situation where the job cannot finish and eventually failed, when I observe from RStudio console, I see the reducer went up to 16%, then 48%, then stuck there for maybe a long time(say 10 mins), then it dropped back down to say 32%, then 16%, then went up 48% ... eventually it failed 

when I look into error log, i saw the error message above Rscript cannot be found, but I do have them on data nodes, and they are on my PATH, so if i type Rscript in linux terminal, I will get back the expected output, but as I said I don't have R and RHadoop installed on any of the name nodes, my guess is : now I am running a larger data set(weekly v.s monthly), maybe it is trying to use name nodes as data nodes and then it cannot find Rscript? 

3. besides the Rscipt error, i also saw below error on rJava, but the thing is until I started with my weekly time series, I was able to finish the job successfully despite the fact that I also saw the rJava error (which made me think maybe it doesn't matter) - right now, basically I am stuck on weekly, which made me have to look again at the whole thing 

please shed some light on this, 

The following objects are masked from ‘package:base’:

as.Date, as.Date.numeric

Loading required package: timeDate
Loading required package: methods
This is forecast 5.4

Loading required package: rhdfs
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rJava', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/usr/lib64/R/library/rJava/libs/rJava.so':
libjvm.so: cannot open shared object file: No such file or directory
Failed with error:  ‘package ‘rJava’ could not be loaded’
Warning in FUN(c("forecast", "timeDate", "zoo", "rhdfs", "rJava", "rmr2",  :
can't load rhdfs
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rJava', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared object '/usr/lib64/R/library/rJava/libs/rJava.so':
libjvm.so: cannot open shared object file: No such file or directory
Warning in FUN(c("forecast", "timeDate", "zoo", "rhdfs", "rJava", "rmr2",  :
can't load rJava
Loading required package: rmr2
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: bitops
Loading required package: digest
Loading required package: reshape2
Loading required package: stringr
Loading required package: plyr
Loading required package: caTools
sh: cat: command not found
Error in writeBin(.Call("typedbytes_writer", objects, native, PACKAGE = "rmr2"),  :
ignoring SIGPIPE signal
Calls: <Anonymous> ... keyval.writer -> <Anonymous> -> typedbytes.writer -> writeBin
No traceback available
Error during wrapup:
Execution halted
sh: rm: command not found
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262) 

Antonio Piccolboni

unread,
Jun 16, 2014, 2:06:14 PM6/16/14
to rha...@googlegroups.com, ant...@piccolboni.info
You are seeking a yes/no answer, unfortunately neither is correct given the way you formulated your question. I stand by my first answer. It may or may not coincide with a yes to your question depending on your actual hadoop configuration. If a name node doesn't run tasks or is not selected as the R master node,  it doesn't need rmr2 installed.


Antonio
Reply all
Reply to author
Forward
0 new messages