rhdfs does not work properly with namenode high availability

74 views
Skip to first unread message

Lorenzo Ramírez Hernández

unread,
Apr 24, 2014, 7:45:05 AM4/24/14
to rha...@googlegroups.com
Good evening.

This is the scenario:
      Cluster with head nodes:
           Namenode in H.A.
               Namenode server #1
               Namenode server #2
           Jobtracker
           Cloudera Manager
           Service node
               Hue
               R-Studio + R + Rhadoop

     Slaves nodes:
           N nodes with:
               DataTracker
               TaskTracker
               R + RHadoop

Problem:
    Prior to deploy Namenode High Availability, we did successfully deploy R-Studio server + R + RHadoop. We did test successfully several jobs from R, interacting with RMR and RHDFS.
    Next step: deploy high availability. Also did tests, but, when trying to access HDFS within R, it take too long to perform basic hdfs operations. Afterwards, we did define several environment variables: HADOOP_CMD, HADOOP_HOME, and HADOOP_STREAMING. In the following tests, basic operations with HDFS didn't work at all.

    It seems that rhadoop doesn't know how to interact with namenode in ha mode. 

    Looking for new versions, we found 1.0.8 version form rhdfs, and 3.1.0 version form rmr2.

    did anyone face a problem like this?
    has anyone tested this versions?


Thanks in advance.

                  

Angel Cortes

unread,
May 7, 2014, 2:57:42 AM5/7/14
to rha...@googlegroups.com
Good Morning,

I have the same problem.
We activated HA in the HDFS. After that when we try to use Rhdfs we have this error in the init().

library(rhdfs)
hdfs.init()

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1

"nameservice1" is the name of the HA cluster of Namenodes. The HA is working fine with all the products except with R.

In other packages like Rhive we can modify the FS environment to point directly to the namenodes(RHive:::.setEnv(DEFAULT_FS="hdfs://namenode1:8020/")) and dont use HA for the Rhive proccess but with Rhdfs we are unable to made it works.

Could someone help with this issue? Our proccess will not work in the production environment if we are unable to solve this problem.

Regards,
Angel

David Champagne

unread,
May 7, 2014, 2:58:07 PM5/7/14
to rha...@googlegroups.com
I have a CDH5 cluster with HA and Kerberos enabled, and rhdfs_1.0.8.  All tests pass without error.   Are you sure you have your "core-site.xml" and "hdfs-site.xml" setup properly on the node you are running R?

Angel Cortes

unread,
May 8, 2014, 7:34:25 AM5/8/14
to rha...@googlegroups.com
Hi David,

Really thx for your reply. I just checked that in the system library is rhdfs1.0.6. If i load the rhdfs.1.0.8 it works with only a warning like:

14/05/08 13:33:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Thanks a lot
Reply all
Reply to author
Forward
0 new messages