Spark compilation with CDH 4.5.0

1,106 views
Skip to first unread message

Debasish Das

unread,
Dec 13, 2013, 1:37:08 AM12/13/13
to spark...@googlegroups.com
Hi,

I could compile Spark with CDH 4.2.0 but when I tried to access hdfs it failed. 

I looked for the old post on Spark user group and found that Spark should be compiled with the exact hadoop client version of the cluster. 

Our cluster is at CDH 4.5.0. I put the following configs for the compilation on the master branch:

export SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0
export SPARK_YARN=true

I also tried to see if I can build against the client only

export SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.5.0
export SPARK_YARN=false

I am getting 43 compilation errors from spark-streaming project. 

I have attached few msgs.

[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:51: type mismatch;
[error]  found   : org.apache.spark.streaming.DStream[(K, V)]
[error]  required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error]  Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error]     dstream.filter((x => f(x).booleanValue()))
[error]                   ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:54: type mismatch;
[error]  found   : org.apache.spark.streaming.DStream[(K, V)]
[error]  required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error]  Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error]   def cache(): JavaPairDStream[K, V] = dstream.cache()
[error]                                                     ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:57: type mismatch;
[error]  found   : org.apache.spark.streaming.DStream[(K, V)]
[error]  required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error]  Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error]   def persist(): JavaPairDStream[K, V] = dstream.persist()
[error]                                                         ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:60: type mismatch;
[error]  found   : org.apache.spark.streaming.DStream[(K, V)]
[error]  required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error]  Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error]   def persist(storageLevel: StorageLevel): JavaPairDStream[K, V] = dstream.persist(storageLevel)
[error]                                                                                   ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:66: type mismatch;
[error]  found   : org.apache.spark.streaming.DStream[(K, V)]
[error]  required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error]  Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error]   def repartition(numPartitions: Int): JavaPairDStream[K, V] = dstream.repartition(numPartitions)
[error]                                                                                   ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:83: type mismatch;
[error]  found   : org.apache.spark.streaming.DStream[(K, V)]
[error]  required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error]  Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error]     dstream.window(windowDuration)
[error]                   ^

Note that the project compiled fine with CDH 4.2.0 but I could not access our HDFS data.

Thanks.
Deb

Tathagata Das

unread,
Dec 13, 2013, 1:39:05 AM12/13/13
to spark...@googlegroups.com
Can you try doing a "sbt clean" before building? I have seen this error once and a clean build helped. 


--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Message has been deleted

Debasish Das

unread,
Dec 13, 2013, 2:54:36 AM12/13/13
to spark...@googlegroups.com
Thanks TD. sbt clean helped.

With these configs I could get the jar file and it runs fine on the standalone spark cluster:

export SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.5.0
export SPARK_YARN=false

If I try to generate the deployment jar for YARN with the following configs, I am getting errors.

export SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0
export SPARK_YARN=true

Thanks.
Deb 

Patrick Wendell

unread,
Dec 13, 2013, 2:03:32 PM12/13/13
to spark...@googlegroups.com
What errors are you getting in this case? Is it the same errors as before or something else?

Debasish Das

unread,
Dec 16, 2013, 7:51:15 PM12/16/13
to spark...@googlegroups.com
Hi Patrick,

With the following configs:

export SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0
export SPARK_YARN=true

Inside the project yarn, the errors are as follows:

[warn] /home/debasish/sag_spark/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:59: Treating numbers with a leading zero as octal is deprecated.
[warn]   val STAGING_DIR_PERMISSION: FsPermission = FsPermission.createImmutable(0700:Short)
[warn]                                                                           ^
[warn] /home/debasish/sag_spark/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:62: Treating numbers with a leading zero as octal is deprecated.
[warn]   val APP_FILE_PERMISSION: FsPermission = FsPermission.createImmutable(0644:Short) 
[warn]                                                                        ^
[error] /home/debasish/sag_spark/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:36: object AMResponse is not a member of package org.apache.hadoop.yarn.api.records
[error] import org.apache.hadoop.yarn.api.records.{AMResponse, ApplicationAttemptId}
[error]        ^
[error] /home/debasish/sag_spark/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala:105: value getAMResponse is not a member of org.apache.hadoop.yarn.api.protocolrecords.AllocateResponse
[error]     val amResp = allocateWorkerResources(workersToRequest).getAMResponse
[error]                                                            ^
[warn] two warnings found
[error] two errors found
[error] (yarn/compile:compile) Compilation failed
[error] Total time: 15 s, completed Dec 16, 2013 7:47:03 PM

Note that I can run the code against cdh4.5.0 mr1 client but we need the YARN jar for deployment.

Thanks.
Deb

Matei Zaharia

unread,
Dec 17, 2013, 1:02:41 AM12/17/13
to spark...@googlegroups.com
Ah, this is because of a YARN API update in CDH 4.5.0 (as well as Apache Hadoop 2.2). You’ll need to wait for Spark 0.8.1 to compile against that. There is a release candidate posted on our Apache mailing list: http://spark.incubator.apache.org/mailing-lists.html.

Matei

Debasish Das

unread,
Dec 17, 2013, 2:56:24 PM12/17/13
to spark...@googlegroups.com
Thanks Matei. 

We will wait for the release candidate of Spark 0.8.1 and whether it could be run against latest CDH/HDP YARN.

Koert Kuipers

unread,
Dec 17, 2013, 3:26:56 PM12/17/13
to spark...@googlegroups.com
i compiled against cdh 4.3.0 and been using it with 4.5.0 without using. i believe all cdh 4.x are wire compatible


Koert Kuipers

unread,
Dec 17, 2013, 3:27:30 PM12/17/13
to spark...@googlegroups.com
sorry meant to say: i compiled against cdh 4.3.0 and been using it with 4.5.0 without trouble. i believe all cdh 4.x versions are wire compatible

Kevin Moulart

unread,
Dec 23, 2013, 9:43:21 AM12/23/13
to spark...@googlegroups.com
I just tried to compile version 0.8.1 against CDH-4.5.0 and it failed just the same.

Patrick Wendell

unread,
Dec 23, 2013, 12:16:38 PM12/23/13
to spark...@googlegroups.com
Hey Kevin,

Could you give us the exact command that you are using to compile?
It's possible the YARN API changed in CDH 4.5 and our heuristics don't
detect it correctly.

Kevin Moulart

unread,
Dec 23, 2013, 12:27:57 PM12/23/13
to spark...@googlegroups.com
Hi thanks for the answer, I'm using this command to compile :


SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true ./sbt/sbt assembly

When I do that, it runs for about 3-5 minutes and then after a long "packaging" phase, it simply says it failed.

2013/12/23 Patrick Wendell <pwen...@gmail.com>
You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/T1soH67C5M4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-users...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45

Patrick Wendell

unread,
Dec 23, 2013, 1:41:49 PM12/23/13
to spark...@googlegroups.com
Hey Kevin,

I looked some more. It turns out CDH 4.4 and 4.5 include changes to
the YARN API that are not compatible with Spark's YARN implementation,
specifically [1].

We've gone through some effort to make Spark-yarn work well with both
the YARN 2.2 stable API's (which will be in CDH5) and one popular
version of the "alpha" YARN API's (the one in CDH 4.1-4.3 and used
inside of Yahoo).

However right now Spark doesn't support this particular version, as it
is a one-off build and only in CDH 4.4/5. So the solution here is to
either rollback to an earlier CDH, to patch Spark to work with CDH
4.4's version of the YARN API, or just deploy Spark using the
standalone cluster manager instead of using YARN (AFAIK YARN is still
considered experimental in CDH4.X anyways).

[1] https://issues.apache.org/jira/browse/YARN-45

Kevin Moulart

unread,
Dec 23, 2013, 1:51:27 PM12/23/13
to spark...@googlegroups.com
Hey thanks again, that's what I feared anyway, but I had hope they would assure retrocompatibility for this new build of CDH.

I'll try with the standalone cluster manager.

Thanks again, and merry christmas !
Reply all
Reply to author
Forward
0 new messages