Connecting RStudio running on Windows client machine to remote Hadoop cluster.

117 views
Skip to first unread message

Michael Bremen

unread,
Apr 8, 2015, 6:17:35 PM4/8/15
to rha...@googlegroups.com
I know this question has been asked before, but I did not find a complete answer on how one can do this.
Please post links to the manual that has a step by step explanation of connecting the two.

My requirement is simple

I have a Windows 7 (64 bit) workstation that I use for running R & RStudio. I recently got access to an edge node of a larger hadoop cluster, I can login to the edge node using Putty from my workstation. I can see MapR is installed in the edge node. The edge does not have R or RStudio. My windows workstation does not have MapR hadoop or any other hadoop.

I would like to use this information to connect RStudio with the edge node (and / or) subsequently to the MapR hadoop cluster. Once I do this, I would be able to run R scripts from my workstation that would initiate jobs on the remote MapR cluster. Results after processing would be available to read into a variable in R for further analysis. If assignment of results directly to a variable is not possible, then results would be written to an output file in HDFS that can be used to read into a variable using read.table or other methods.

I am new to Hadoop and R, so please be specific giving instructions such as

a. What software do I need to install on my Windows workstation ?
b. What software do I need to install on edge node?
c. What software do I need to install on other nodes on the hadoop cluster such as namenode, resource manager etc?
d. What privileges are required for the user account on Windows, edge node etc.

Please excuse me if I am asking for trivial details, but I really appreciate your help in this matter.
Reply all
Reply to author
Forward
0 new messages