Spark with CDH4.2.0

569 views
Skip to first unread message

Laxman Vemula

unread,
Apr 1, 2013, 10:48:04 AM4/1/13
to spark...@googlegroups.com
I have a CDH4.2.0 cluster. How should I compile Spark against CDH4.2.0. I have tried with HADOOP_VERSION = "2.0.0-cdh4.2.0" in the file project/SparkBuild.scala. But, the dependencies could not be resolved. Kindly suggest on how to make Spark work with YARN on CDH4.2.0 cluster.

Thanks,
Laxman

Patrick Wendell

unread,
Apr 1, 2013, 12:44:53 PM4/1/13
to spark...@googlegroups.com
Hey Laxman,

Did you see the example in SparkBuild.scala:

// For Hadoop 2 versions such as "2.0.0-mr1-cdh4.1.1", set the
HADOOP_MAJOR_VERSION to "2"
//val HADOOP_VERSION = "2.0.0-mr1-cdh4.1.1"
//val HADOOP_MAJOR_VERSION = "2"

You should try "2.0.0-mr1-cdh4.2.0". Also, I believe that there are
some outstanding pull requests to improve YARN support that are going
in on the order of days. In the mean time you might want to just
statically allocate a few machines for running Spark.

- Patrick
> --
> You received this message because you are subscribed to the Google Groups
> "Spark Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spark-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Laxman Vemula

unread,
Apr 1, 2013, 6:01:56 PM4/1/13
to spark...@googlegroups.com
Thanks for the response.

I was referring to yarn branch on github, which has an old version of SparkBuild.scala.Now I have built it successfully. But, now I am getting the following error on one of the nodes.

Exception in thread "Thread-2" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:616)
	at spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:167)
Caused by: java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: hdfs
	at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:578)
	at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:363)
	at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:336)
	at spark.SparkContext.hadoopFile(SparkContext.scala:252)
	at spark.SparkContext.textFile(SparkContext.scala:224)
	at SvmPrimal$.main(SvmPrimal.scala:33)
	at SvmPrimal.main(SvmPrimal.scala)
	... 5 more
Caused by: java.io.IOException: No FileSystem for scheme: hdfs
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2250)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2257)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2296)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2278)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:316)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:162)
	at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:574)
	... 11 more

This could be due to some configuration issue with CDH4.2.0. Can somebody help on resolving this.

Thanks,
Laxman

Patrick Wendell

unread,
Apr 1, 2013, 6:18:30 PM4/1/13
to spark...@googlegroups.com
Checkout this thread, you may need to add a configuration option
because of a CDH change:

https://groups.google.com/forum/?fromgroups=#!topic/spark-users/cSefxQ6HIVU

Laxman Vemula

unread,
Apr 1, 2013, 7:11:37 PM4/1/13
to spark...@googlegroups.com
I have seen that thread. It is suggested to change pom file for maven. But I am using sbt for building spark. What needs to be updated if I build using sbt?

Patrick Wendell

unread,
Apr 1, 2013, 7:15:26 PM4/1/13
to spark...@googlegroups.com
Did you try the simpler work-around in that thread?

>> It's fixed, just add 'fs.hdfs.impl' property to conf is ok. The reason is that cdh4.1.1 remove 'fs.hdfs.impl' property from core-default.xml .
But there is still something wrong with HdfsTest,I start it like this:

Laxman Vemula

unread,
Apr 1, 2013, 7:23:04 PM4/1/13
to spark...@googlegroups.com
I have tried that by setting the value to 'org.apache.hadoop.dfs.DistributedFileSystem'. I am getting a class not found error. Is this the right value to be set?


You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/7Rca1ZlJ17M/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to spark-users...@googlegroups.com.

Laxman Vemula

unread,
Apr 2, 2013, 3:39:07 PM4/2/13
to spark...@googlegroups.com
Can somebody help on resolving this issue. 

Thanks,
Laxman

Patrick Salami

unread,
Apr 2, 2013, 4:10:09 PM4/2/13
to spark...@googlegroups.com
By the way, we also had an issue with CDH4 because Spark requires the fs.default.name property to be set in hadoop's core-site.xml file, but this property was deprecated in CDH4. Manually adding the following to core-site.xml (on each node) fixed the issue:

 <property>
        <name>fs.default.name</name>
        <value>hdfs://hadoop-testcluster-master:8020</value>
  </property>

Best,

Patrick

Laxman Vemula

unread,
Apr 3, 2013, 5:19:37 AM4/3/13
to spark...@googlegroups.com
Where can I find the core-site.xml that CDH4 uses? I have tried setting it in /usr/lib/hadoop/etc/hadoop but It doesn't reflected when I checked from cloudera manager.
Reply all
Reply to author
Forward
0 new messages