Working of R on Hadoop (Installation)- Step By Step guide

2,077 views
Skip to first unread message

$hekhar

unread,
Apr 18, 2011, 1:24:46 AM4/18/11
to Bangalore R Users - BRU
Hi,
As we all are well aware that ,industries are now facing big data
problem, where the data size is tremendously increasing ( Peta scale),
and they are trying their best to solve this big data problem.
Since existing stand alone systems are failing to scale,so they are
approaching towards cluster based solutions like Hadoop etc.

This post is aimed for the beginners where he/she wants to leverage
the statistical and graphical capabilities of R on distributed
environment like Hadoop.

Before going through the step by step installation, let me give you
the system configuration details.

--------------------------------------------------------------------------------------------------------
System Configuration |
Version
------------------------------------------------------------------------------------------------------
OS | Centos
5.5
-----------------------------------------------------------------------------------------------------
Hadoop |
0.20.2
------------------------------------------------------------------------------------------------------
Java |
1.6.0_22
-----------------------------------------------------------------------------------------------------
R |
2.12.0
---------------------------------------------------------------------------------------------------
Rhipe |
0.63
---------------------------------------------------------------------------------------------------
protobuf.pc |
2.3.0
--------------------------------------------------------------------------------------------------

I have two node cluster, the RAM and the processor speed for both of
the machines are 1GB and 2.66GHz respectively.

1. Installing R.

R needs to be installed on all of the nodes where Hadoop is running

(A) Run the following command for installing R

yum install R
(B) Installation might fail due to the absence of RPM_GPG_KEY_EPEL.
Download and create the key under /etc/pki/rpm-gpg
(C) Check whether the R command is working or not, by typing "R" on
the terminal window, it should start the R terminal


2. Installing Rhipe

Pre-Requisites
The following packages are the requirements before installing
RHIPE package on Hadoop:
I. R
II. protobuf.pc (version 2.3.0)
It is Google’s protocol buffer which is used by RHIPE for
serialization of the R objects. A benefit of using this is that data
produced by RHIPE can be read in languages such as Python, C and Java,
although RHIPE cannot serialize all the R objects.

Installing protobuf.pc
I. Download protobuf.pc (protobuf-2.3.0.tar.gz) from
http://code.google.com/p/protobuf/downloads/list .

II. Execute the following commands serially after unzipping the
contents
A. sh configure
B. make
C. make install

Create an environment variable PKG_CONFIG_PATH=/usr/local/lib/
pkgconfig .

III. Check for the proper installation of protobuf.pc
pkg-config --modversion protobuf
The output should be 2.3.0

IV. Check for the proper installation ofprotobuf libraries.
pkg-config --libs protobuf
The output should be –pthread –L/usr/local/lib –lprotobuf –l2

Installing RHIPE
I. Download RHIPE from http://www.stat.purdue.edu/~sguha/rhipe/
and place the contents in a directory Rhipe. For example : /home/user/
Rhipe

II. Create the following environment variables in .bashrc

HADOOP = location to hadoop installation
HADOOP_LIB = $HADOOP/lib
HADOOP_CONF_DIR=$HADOOP/conf

III. Execute the following command from the directory /home/user/
Rhipe.

R CMD INSTALL Rhipe

As a check open the R console and load the Rhipe library using
the command library(Rhipe),it shouldn’t throw error.


The installation will fail if the libraries are not properly linked.
The error is as shown below:

Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/usr/lib64/R/library/Rhipe/libs/
Rhipe.so':
libprotobuf.so.6: cannot open shared object file: No such file or
directory
ERROR: loading failed
* removing â/usr/lib64/R/library/Rhipeâ

In order to link the libraries to LD_LIBRARY_PATH, create the file
Protobuf-x86.conf under /etc/ld.so.conf.d/ and add the line /usr/local/
lib to the file.
Sometimes even after creating the file Protobuf-x86.conf, library
libprotobuf.so is not found. This is because of the presence of stale
cache ld.so.cache. For removing the stale cache execute the command
ldconfig.



Now start the R console and type the library(Rhipe), it should not
throw any error. This concludes the Rhipe Installation.

Regards,
Som Shekhar

sudheer...@gmail.com

unread,
Jan 11, 2013, 2:50:20 AM1/11/13
to bru...@googlegroups.com
hi

i followed the above procedure and
i am encountered with the following errors


/root/Rhipe/src/message.cc:
153: undefined reference to `TYPEOF'
/root/Rhipe/src/message.cc:184: undefined reference to `LENGTH'
/root/Rhipe/src/message.cc:185: undefined reference to `RAW'
/root/Rhipe/src/message.cc:156: undefined reference to `LENGTH'
/root/Rhipe/src/message.cc:158: undefined reference to `LOGICAL'
/root/Rhipe/src/message.cc:174: undefined reference to `LENGTH'
/root/Rhipe/src/message.cc:175: undefined reference to `INTEGER'
/root/Rhipe/src/message.cc:179: undefined reference to `LENGTH'
/root/Rhipe/src/message.cc:180: undefined reference to `REAL'
/root/Rhipe/src/message.cc:193: undefined reference to `COMPLEX'
/root/Rhipe/src/message.cc:194: undefined reference to `COMPLEX'
/root/Rhipe/src/message.cc:190: undefined reference to `LENGTH'
/root/Rhipe/src/message.cc:206: undefined reference to `STRING_ELT'
/root/Rhipe/src/message.cc:206: undefined reference to `R_NaString'
/root/Rhipe/src/message.cc:209: undefined reference to `STRING_ELT'
/root/Rhipe/src/message.cc:209: undefined reference to `R_CHAR'


can u please help me

thanks for the help

Shekhar

unread,
Feb 9, 2013, 11:53:56 AM2/9/13
to bru...@googlegroups.com, sudheer...@gmail.com
What youwere trying to do...
Reply all
Reply to author
Forward
0 new messages