Spark on CDH4

358 views
Skip to first unread message

MLnick

unread,
Oct 18, 2012, 11:11:59 AM10/18/12
to spark...@googlegroups.com
Trying to compile Spark 0.6.0 against hadoop 2.0.0-cdh4.0.0.

Running MRv1 (but that shouldn't make a difference since it should just be HDFS-related classes required by Spark yes?).

hadoop-core-2.0.0-cdh4.0.0 doesn't exist, so I tried against hadoop-core-2.0.0-mr1-cdh4.0.0, but get compilation errors, e.g.:

[error] /home/pentreathn/workspace/scala/spark-0.6.0/core/src/main/scala/spark/HadoopWriter.scala:3: object fs is not a member of package org.apache.hadoop
[error] import org.apache.hadoop.fs.FileSystem
[error]                          ^
[error] /home/pentreathn/workspace/scala/spark-0.6.0/core/src/main/scala/spark/HadoopWriter.scala:4: object fs is not a member of package org.apache.hadoop
[error] import org.apache.hadoop.fs.Path
[error]                          ^
[error] /home/pentreathn/workspace/scala/spark-0.6.0/core/src/main/scala/spark/HadoopWriter.scala:5: ReflectionUtils is not a member of org.apache.hadoop.util
[error] import org.apache.hadoop.util.ReflectionUtils
[error]        ^
[error] /home/pentreathn/workspace/scala/spark-0.6.0/core/src/main/scala/spark/HadoopWriter.scala:6: object io is not a member of package org.apache.hadoop
[error] import org.apache.hadoop.io.NullWritable
[error]                          ^
[error] /home/pentreathn/workspace/scala/spark-0.6.0/core/src/main/scala/spark/HadoopWriter.scala:7: object io is not a member of package org.apache.hadoop
[error] import org.apache.hadoop.io.Text
[error]                          ^
[error] error while loading JobConf, Missing dependency 'class org.apache.hadoop.conf.Configuration', required by /home/pentreathn/workspace/scala/spark-0.6.0/lib_managed/jars/org.apache.hadoop/hadoop-core/hadoop-core-2.0.0-mr1-cdh4.0.0.jar(org/apache/hadoop/mapred/JobConf.class)
[error] /home/pentreathn/workspace/scala/spark-0.6.0/core/src/main/scala/spark/SerializableWritable.scala:6: object io is not a member of package org.apache.hadoop
[error] import org.apache.hadoop.io.Writable
[error]    


Anyone successfully compiled against CDH4.0.0?

Thomas Dudziak

unread,
Oct 18, 2012, 11:38:41 AM10/18/12
to spark...@googlegroups.com
Unfortunately there had been backwards-incompatible changes in Hadoop 2 for two of the classes that Spark uses. I had created a pull request (https://github.com/mesos/spark/pull/264) to allow compilation of Spark 0.5 against a Hadoop 2 distribution such as CDH4. I'll try to update it for 0.6 later today.

cheers,
Tom

Matei Zaharia

unread,
Oct 18, 2012, 2:08:06 PM10/18/12
to spark...@googlegroups.com
Yeah, the same approach should work in 0.6. I need to do a bit of work to merge 0.6 into the master branch, but after that, I'd like to apply your pull request to both that and 0.5. If you make it against the "dev" branch right now that would probably be easiest.

Matei

Ray Racine

unread,
Oct 18, 2012, 2:25:13 PM10/18/12
to spark...@googlegroups.com
Another thought is you could just make 0.6 branch the master branch and avoid the merge.

Ray Racine

unread,
Oct 18, 2012, 2:28:57 PM10/18/12
to spark...@googlegroups.com
Ignore previous.  Mis-read my gitk.

Thomas Dudziak

unread,
Oct 18, 2012, 7:18:56 PM10/18/12
to spark...@googlegroups.com
I've re-submitted the pull request against the dev branch in https://github.com/mesos/spark/pull/285.

cheers,
Tom


On Thursday, October 18, 2012 at 11:28 AM, Ray Racine wrote:

> Ignore previous. Mis-read my gitk.
>

MLnick

unread,
Oct 19, 2012, 2:56:54 PM10/19/12
to spark...@googlegroups.com
Thanks I applied this patch against dev and everything works fine with CDH4.0.0.
Reply all
Reply to author
Forward
0 new messages