A few years ago I wrote bashRhipeArchive so I guess I can shed some light on this, even though I have actively used Rhipe in about a year and a half (because school). I am not sure if some of the back end has been cleaned up or simplified, so first reference:
You will note at the bottom:
rhoptions(zips = "/yourloginname/bin/R.Pkg.tar.gz")
That is how Rhipe knows. Zips goes to hadoop and hadoop uses distributed cache to open a symbol link to an current copy of the tar.gz (only copying when it has been changed) in the working directory of the node. What ever you tell zips goes from the HDFS to the node and gets untar-gzed.
Now you will also note:
rhoptions(runner = "sh ./R.Pkg/library/Rhipe/bin/RhipeMapReduce.sh")
The runner is a shell script that needs to accomplish two things on a node:
1) Set up the environment
2) Run appropriately RhipeMapReduce (a C program that is compiled against the R source to be an interpreter).
R.tar.gz is an executable with *every single compiled .so file I could find for R to run* including those needed for packages, and if you unzip it and inspect it you will find that. It just so happens that RhipeMapReduce is also distributed with the compiled R.
That said, you could easily run into problems using bashRhipeArchive for a non-standard sort of front end. Its purpose in life as a function is to automate something we used to do a lot by-hand. What you need to do in that case is either fix up the archive with whatever is missing, or rebuild it by following what I did bashRhipeArchive. You need to build an R instance that will run on the node from just the tar.gz , and it needs to have all your packages that you need on the node (including Rhipe so it will have RhipeMapReduce)