RhipeMapReduce binary and shell script - How they fit in the scheme of things

hh

unread,

Jun 9, 2015, 12:18:47 PM6/9/15

to rh...@googlegroups.com

Hi,

I am currently attempting to get Tessera to work on a cluster (running RHEL). The sys. admin. has compiled two versions of R. One version is build using 'normal' compilation, while the other version was compiled to use an external BLAS library; in this case, Intel's MKL BLAS library.

From the testing I've done, if I use bashRhipeArchive function to package R and its libraries, the normally compiled R version works, but the MKL version doesn't.

For the MKL version, I know there are environment variables I need to export before starting R. From what I've read in this mailing list, 'RhipeMapReduce.sh' and 'RhipeMapReduce' binary seem to be the key, but I have no idea how these two fit in the scheme of things in RHIPE.

I suspect I need to create a custom RhipeMapReduce.sh script with MKL environmental variable declarations.

I am also unsure how the worker nodes in Hadoop interact with the package created by bashRhipeArchive function. It seem I only need to specify rhoptions['runner'] once from the master node, so how does the worker nodes knows to use this package?

Any help to understand the general architecture of RHIPE is much appreciated.

Thanks in advance.

yi-chia wang

unread,

Jul 5, 2015, 9:20:50 PM7/5/15

to rh...@googlegroups.com

when i type in this

$ R CMD INSTALL Rhipe_0.73.1.tar.gz

the error message is

** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Rhipe requires HADOOP_HOME or HADOOP or HADOOP_BIN environment variable to be present
$HADOOP/bin/hadoop or $HADOOP_BIN/hadoop should exist
Rhipe: HADOOP_BIN is missing, using $HADOOP/bin
HADOOP_HOME missing
HADOOP_CONF_DIR missing, you are probably going to have a problem running RHIPE.
HADOOP_CONF_DIR should be the location of the directory that contains the configuration files
------------------------------------------------
| Please call rhinit() else RHIPE will not run |
------------------------------------------------

how can i set the hadoop Environment variables for rhipe ?

i can't solve this problem can anyone can help me ?????

please thx~!!

hh於 2015年6月9日星期二 UTC-4下午12時18分47秒寫道：

Jeremiah Rounds

unread,

Jul 5, 2015, 9:30:18 PM7/5/15

to rh...@googlegroups.com, hon....@student.uts.edu.au

A few years ago I wrote bashRhipeArchive so I guess I can shed some light on this, even though I have actively used Rhipe in about a year and a half (because school). I am not sure if some of the back end has been cleaned up or simplified, so first reference:

http://tessera.io/docs-RHIPE/#install-and-push

You will note at the bottom:

rhoptions(zips = "/yourloginname/bin/R.Pkg.tar.gz")

That is how Rhipe knows. Zips goes to hadoop and hadoop uses distributed cache to open a symbol link to an current copy of the tar.gz (only copying when it has been changed) in the working directory of the node. What ever you tell zips goes from the HDFS to the node and gets untar-gzed.

Now you will also note:

rhoptions(runner = "sh ./R.Pkg/library/Rhipe/bin/RhipeMapReduce.sh")

The runner is a shell script that needs to accomplish two things on a node:

1) Set up the environment

2) Run appropriately RhipeMapReduce (a C program that is compiled against the R source to be an interpreter).

R.tar.gz is an executable with *every single compiled .so file I could find for R to run* including those needed for packages, and if you unzip it and inspect it you will find that. It just so happens that RhipeMapReduce is also distributed with the compiled R.

That said, you could easily run into problems using bashRhipeArchive for a non-standard sort of front end. Its purpose in life as a function is to automate something we used to do a lot by-hand. What you need to do in that case is either fix up the archive with whatever is missing, or rebuild it by following what I did bashRhipeArchive. You need to build an R instance that will run on the node from just the tar.gz , and it needs to have all your packages that you need on the node (including Rhipe so it will have RhipeMapReduce)

Reply all

Reply to author

Forward