I looked for the old post on Spark user group and found that Spark should be compiled with the exact hadoop client version of the cluster.
Our cluster is at CDH 4.5.0. I put the following configs for the compilation on the master branch:
I have attached few msgs.
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:51: type mismatch;
[error] found : org.apache.spark.streaming.DStream[(K, V)]
[error] required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error] Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error] dstream.filter((x => f(x).booleanValue()))
[error] ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:54: type mismatch;
[error] found : org.apache.spark.streaming.DStream[(K, V)]
[error] required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error] Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error] def cache(): JavaPairDStream[K, V] = dstream.cache()
[error] ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:57: type mismatch;
[error] found : org.apache.spark.streaming.DStream[(K, V)]
[error] required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error] Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error] def persist(): JavaPairDStream[K, V] = dstream.persist()
[error] ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:60: type mismatch;
[error] found : org.apache.spark.streaming.DStream[(K, V)]
[error] required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error] Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error] def persist(storageLevel: StorageLevel): JavaPairDStream[K, V] = dstream.persist(storageLevel)
[error] ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:66: type mismatch;
[error] found : org.apache.spark.streaming.DStream[(K, V)]
[error] required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error] Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error] def repartition(numPartitions: Int): JavaPairDStream[K, V] = dstream.repartition(numPartitions)
[error] ^
[error] /home/debasish/sag_spark/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala:83: type mismatch;
[error] found : org.apache.spark.streaming.DStream[(K, V)]
[error] required: org.apache.spark.streaming.api.java.JavaPairDStream[K,V]
[error] Note: implicit method fromPairDStream is not applicable here because it comes after the application point and it lacks an explicit result type
[error] dstream.window(windowDuration)
[error] ^
Note that the project compiled fine with CDH 4.2.0 but I could not access our HDFS data.
Thanks.