Mapreduce libraries as "provided" runtime dependencies for Spark Programs.

71 views
Skip to first unread message

Sharanya Santhanam

unread,
Jul 14, 2016, 7:42:35 PM7/14/16
to CDAP User
I currently have the yarn.application.classpath set to the default (yarn-default.xml) which is. 

$HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $HADOOP_YARN_HOME/share/hadoop/yarn/*, $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*

Note :  /share/hadoop/mapreduce is not on the classpath,  however I do have these libraries available on all of the cluster machines at the above mentioned location.

Is there a way to set the yarn application classpath from a CDAP application? or any other means to override it from the client side, perhaps a cdap master config ? 

Thanks,
Sharanya

Sreevatsan Raman

unread,
Jul 14, 2016, 10:12:30 PM7/14/16
to Sharanya Santhanam, CDAP User
Hi Sharanya,

You configure app.program.extra.classpath in cdap-site.xml to make extra classpath available to all CDAP programs.
You will need to restart cdap after changing the configuration in cdap-site.xml

Thanks,
Sree

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/0cb70744-8a9f-4d7d-8fe7-f639d27b5156%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sharanya Santhanam

unread,
Jul 15, 2016, 3:18:16 PM7/15/16
to CDAP User
Hey Sreevatsan,

I followed your advice and added the following to my cdap-site.xml & restarted the cdap services.

 <property>
    <name>app.program.extra.classpath</name>
    <value>$HADOOP_COMMON_HOME/share/hadoop/mapreduce/*:$HADOOP_COMMON_HOME/share/hadoop/mapreduce/lib*</value>
    <description>
      Extra ClassPath for CDAP programs. Explicitly calling out mapreduce libs which are needed for spark programs. 
    </description>
  </property>


However my job still fails due to missing libs. From logging into the DN which has the failing container I do see the cConf.xml file having the property I have set. 


<property><!--Loaded from cdap-site.xml--><name>app.program.extra.classpath</name><value>$HADOOP_COMMON_HOME/share/hadoop/mapreduce/*:$HADOOP_COMMON_HOME/share/hadoop/mapreduce/lib*</value></property>


 Where does the program class path get set? Am I missing something? 

Thanks,
Sharanya  
SparkClassPathIssue

Sreevatsan Raman

unread,
Jul 15, 2016, 7:12:35 PM7/15/16
to Sharanya Santhanam, CDAP User
Hi Sharanya,

Can you try an absolute path? The environment variables are not expanded in the app.program.extra.classpath

Thanks,
Sree

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.

Sharanya Santhanam

unread,
Jul 19, 2016, 2:37:42 AM7/19/16
to CDAP User, santhana...@gmail.com
Setting the app.program.extra.classpath did not work since the job would fail at a spark submit stage due to a class Path issue.
 java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
        at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:119)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2131)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:91)


Although the commons-logging.jar was present in the classpath I was getting the above error. It boiled down to diff classloader loading the mapreduce libs (or the prorgam extra classpath libs) and  a diff loader for the libs in the yarn classpath.

The only way I was able to resolve this issue was by modifying the yarn.app.classpath to include the mapreduce libs. 

Terence Yim

unread,
Jul 19, 2016, 1:14:22 PM7/19/16
to Sharanya Santhanam, CDAP User
Hi Sharanya,

Yes you are right. The program.extra.classpath only affects the Program ClassLoader (the class loader for your own application) and it is different than the one CDAP system uses in order to provide class loading isolation (e.g. you can use a different guava library version than the one used by CDAP). The error you see is coming from the system when CDAP tries to submit the Spark job to Yarn. I believe it is caused by the usage of theHadoop-less Spark assembly JAR (the one ship with CDH 5.6 or above), hence the need of the yarn.application.classpath, which affect the CDAP system classpath.

Terence

Reply all
Reply to author
Forward
0 new messages