Good evening.
This is the scenario:
Cluster with head nodes:
Namenode in H.A.
Namenode server #1
Namenode server #2
Jobtracker
Cloudera Manager
Service node
Hue
R-Studio + R + Rhadoop
Slaves nodes:
N nodes with:
DataTracker
TaskTracker
R + RHadoop
Problem:
Prior to deploy Namenode High Availability, we did successfully deploy R-Studio server + R + RHadoop. We did test successfully several jobs from R, interacting with RMR and RHDFS.
Next step: deploy high availability. Also did tests, but, when trying to access HDFS within R, it take too long to perform basic hdfs operations. Afterwards, we did define several environment variables: HADOOP_CMD, HADOOP_HOME, and HADOOP_STREAMING. In the following tests, basic operations with HDFS didn't work at all.
It seems that rhadoop doesn't know how to interact with namenode in ha mode.
Looking for new versions, we found 1.0.8 version form rhdfs, and 3.1.0 version form rmr2.
did anyone face a problem like this?
has anyone tested this versions?
Thanks in advance.