Access Hadoop Cluster from my Machine with RHDFS pack with out installing any Rhadoop/R on nodes

88 views
Skip to first unread message

Hbr

unread,
Dec 18, 2014, 2:36:31 AM12/18/14
to rha...@googlegroups.com
Hi,
At my work we have a large Hadoop (hortonworks 2.1 cluster).
I want to connect my R Studio to that Hadoop/HDFS cluster and access data. Note: that I can NOT ask to get R or RHadoop/rmr etc to be installed in every or any of the cluster job/name/task Nodes.

Is this even possible or is there any other package? I saw RHIVE even that needs installation on all nodes..

Simply Put: I currently access my Hadoop HDFS using IDEs like Squirrel Client/Hue/Hive CLI etc. I want to do that using R as well (without needing anything from Hadoop Admins) so
i can copy HDFS files to my dataframe or viceversa.


Thanks
hbr

Antonio Piccolboni

unread,
Dec 18, 2014, 11:27:39 AM12/18/14
to RHadoop Google Group
rhdfs doesn't require any sw installation on the cluster, so it seems to fit your constraints. As far as the specifics for Rstudio, there are some issues with the way Rstudio starts R wrt environment variables, so I recommend you set them as required, but using Sys.setenv or the .Renviron file.

--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hbr

unread,
Dec 24, 2014, 3:46:38 AM12/24/14
to rha...@googlegroups.com, ant...@piccolboni.info
Thanks Antonio. 
Lets say we start using  RHadoop or RHIPE  does this mean any thing that i do normally with R like say some lm or clustering etc (which has been running only in my single machine)  is made to work paralleled by using the MR/system infrastructure ? OR
using RHadoop/RHipe just gives a way to run our own MR program from R itself, i.e. instead of running MR from java/python and moving result set into R, we get to run MR from R? 
please share your inputs
Thank you.

Antonio Piccolboni

unread,
Dec 24, 2014, 11:57:46 AM12/24/14
to RHadoop Google Group
I can't speak for Rhipe but for RHadoop it's your second option (write your own MR, no magic parallelization on existing packages). All the products that promise you don't need to think about MR offer a selection of algorithms, if you like those great otherwise it's the highway. rmr2 is for algorithm developers. There is value in both approaches, as long as there is clarity instead of bogus marketing.


Antonio
Reply all
Reply to author
Forward
0 new messages