CassandraRdd.map( row => row.getInt("id)) does not work , java.lang.ClassNotFoundException happened!

611 views
Skip to first unread message

chen qiang

unread,
Mar 25, 2015, 3:51:57 AM3/25/15
to spark-conn...@lists.datastax.com
Hi everyone:
I'm testing spark with C* while I come with this error. rdd.map() does not work. the exception is java.lang.ClassNotFoundException. Does anyone saw this happen?

Here is my code:

var rdd = sc.cassandraTable("keyspace", "cf")
rdd.collect //this line works with result 5
var row = rdd.first //this like works too
var id = row.getInt("id") //this line works
rdd.map( row => row.getInt("int")) //error!!!

Here is the stacktrace:

scala> rdd2.map(row => (row.getInt("id"))).count
15/03/25 15:43:28 INFO spark.SparkContext: Starting job: count at <console>:41
15/03/25 15:43:28 INFO scheduler.DAGScheduler: Got job 21 (count at <console>:41) with 221 output partitions (allowLocal=false)
15/03/25 15:43:28 INFO scheduler.DAGScheduler: Final stage: Stage 23(count at <console>:41)
15/03/25 15:43:28 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/03/25 15:43:28 INFO scheduler.DAGScheduler: Missing parents: List()
15/03/25 15:43:28 INFO scheduler.DAGScheduler: Submitting Stage 23 (MappedRDD[15] at map at <console>:41), which has no missing parents
15/03/25 15:43:28 INFO storage.MemoryStore: ensureFreeSpace(5272) called with curMem=44479, maxMem=17781636464
15/03/25 15:43:28 INFO storage.MemoryStore: Block broadcast_22 stored as values in memory (estimated size 5.1 KB, free 16.6 GB)
15/03/25 15:43:28 INFO storage.MemoryStore: ensureFreeSpace(2863) called with curMem=49751, maxMem=17781636464
15/03/25 15:43:28 INFO storage.MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated size 2.8 KB, free 16.6 GB)
15/03/25 15:43:28 INFO storage.BlockManagerInfo: Added broadcast_22_piece0 in memory on glodon-b111-beidou-164-40.glodon.com:59058 (size: 2.8 KB, free: 16.6 GB)
15/03/25 15:43:28 INFO storage.BlockManagerMaster: Updated info of block broadcast_22_piece0
15/03/25 15:43:28 INFO scheduler.DAGScheduler: Submitting 221 missing tasks from Stage 23 (MappedRDD[15] at map at <console>:41)
15/03/25 15:43:28 INFO scheduler.TaskSchedulerImpl: Adding task set 23.0 with 221 tasks
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Starting task 27.0 in stage 23.0 (TID 1086, glodon-b111-beidou-164-40.glodon.com, NODE_LOCAL, 2868 bytes)
15/03/25 15:43:28 INFO storage.BlockManagerInfo: Added broadcast_22_piece0 in memory on glodon-b111-beidou-164-40.glodon.com:58041 (size: 2.8 KB, free: 265.4 MB)
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Starting task 28.0 in stage 23.0 (TID 1087, glodon-b111-beidou-164-40.glodon.com, NODE_LOCAL, 2598 bytes)
15/03/25 15:43:28 WARN scheduler.TaskSetManager: Lost task 27.0 in stage 23.0 (TID 1086, glodon-b111-beidou-164-40.glodon.com): java.lang.ClassNotFoundException: $line208.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:274)
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:59)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Starting task 27.1 in stage 23.0 (TID 1088, glodon-b111-beidou-164-40.glodon.com, NODE_LOCAL, 2868 bytes)
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Lost task 28.0 in stage 23.0 (TID 1087) on executor glodon-b111-beidou-164-40.glodon.com: java.lang.ClassNotFoundException ($line208.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 1]
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Starting task 28.1 in stage 23.0 (TID 1089, glodon-b111-beidou-164-40.glodon.com, NODE_LOCAL, 2598 bytes)
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Lost task 27.1 in stage 23.0 (TID 1088) on executor glodon-b111-beidou-164-40.glodon.com: java.lang.ClassNotFoundException ($line208.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 2]
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Starting task 27.2 in stage 23.0 (TID 1090, glodon-b111-beidou-164-40.glodon.com, NODE_LOCAL, 2868 bytes)
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Lost task 28.1 in stage 23.0 (TID 1089) on executor glodon-b111-beidou-164-40.glodon.com: java.lang.ClassNotFoundException ($line208.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 3]
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Starting task 28.2 in stage 23.0 (TID 1091, glodon-b111-beidou-164-40.glodon.com, NODE_LOCAL, 2598 bytes)
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Lost task 27.2 in stage 23.0 (TID 1090) on executor glodon-b111-beidou-164-40.glodon.com: java.lang.ClassNotFoundException ($line208.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 4]
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Starting task 27.3 in stage 23.0 (TID 1092, glodon-b111-beidou-164-40.glodon.com, NODE_LOCAL, 2868 bytes)
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Lost task 28.2 in stage 23.0 (TID 1091) on executor glodon-b111-beidou-164-40.glodon.com: java.lang.ClassNotFoundException ($line208.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 5]
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Starting task 28.3 in stage 23.0 (TID 1093, glodon-b111-beidou-164-40.glodon.com, NODE_LOCAL, 2598 bytes)
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Lost task 27.3 in stage 23.0 (TID 1092) on executor glodon-b111-beidou-164-40.glodon.com: java.lang.ClassNotFoundException ($line208.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 6]
15/03/25 15:43:28 ERROR scheduler.TaskSetManager: Task 27 in stage 23.0 failed 4 times; aborting job
15/03/25 15:43:28 INFO scheduler.TaskSchedulerImpl: Cancelling stage 23
15/03/25 15:43:28 INFO scheduler.TaskSchedulerImpl: Stage 23 was cancelled
15/03/25 15:43:28 INFO scheduler.DAGScheduler: Failed to run count at <console>:41
15/03/25 15:43:28 INFO scheduler.TaskSetManager: Lost task 28.3 in stage 23.0 (TID 1093) on executor glodon-b111-beidou-164-40.glodon.com: java.lang.ClassNotFoundException ($line208.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1) [duplicate 7]
15/03/25 15:43:28 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 23.0, whose tasks have all completed, from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task 27 in stage 23.0 failed 4 times, most recent failure: Lost task 27.3 in stage 23.0 (TID 1092, glodon-b111-beidou-164-40.glodon.com): java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:274)
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:59)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Helena Edelson

unread,
Mar 25, 2015, 2:51:38 PM3/25/15
to spark-conn...@lists.datastax.com
Hi Chen, can you share your build file so we can see your dependencies?

Helena
@helenaedelson

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

chen qiang

unread,
Mar 25, 2015, 8:32:08 PM3/25/15
to spark-conn...@lists.datastax.com
Hi Helena:
Thans for the response.

We are using C* 2.0.11 with Spark 1.1.1, this my Assembly.sbt:

import AssemblyKeys._

assemblySettings

mergeStrategy in assembly <<= (mergeStrategy in assembly) { mergeStrategy =>
{
case entry => {
val strategy = mergeStrategy(entry)
if (strategy == MergeStrategy.deduplicate) MergeStrategy.first
else strategy
}
}
}

name := "sp"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.1"

libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.1"

resolvers += Resolver.url("artifactory", url("http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases"))(Resolver.ivyStylePatterns)

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

and this is the code I entered in spark-shell:
sc.stop
import com.datastax.spark.connector.cql.CassandraConnector
import com.datastax.spark.connector.rdd.partitioner.dht.{CassandraNode, Token, TokenFactory}
import com.datastax.spark.connector.rdd.partitioner.dht.TokenRange
import com.datastax.spark.connector.rdd.partitioner.{CassandraRDDPartitioner, CassandraPartition, CqlTokenRange}
import com.datastax.spark.connector.cql._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import com.datastax.spark.connector.rdd.CqlWhereClause
import com.datastax.spark.connector._
import org.apache.cassandra.thrift
import scala.collection.JavaConversions._
import java.net.InetAddress
import org.apache.spark.sql.cassandra.CassandraSQLContext
import org.apache.spark.sql.{ SQLContext, SchemaRDD}
import com.datastax.spark.connector.CassandraRow


var conf = new SparkConf(true).set("spark.cassandra.connection.host", "192.168.164.28").set("spark.cassandra.input.split.size", "1000")
var sc = new SparkContext(conf)
var rdd = sc.cassandraTable("xxx", "xxx")
rdd.map(x => x.getInt("id")).count

Helena Edelson

unread,
Mar 26, 2015, 7:36:21 AM3/26/15
to spark-conn...@lists.datastax.com
Hi

Did you deploy the cassandra/connector jars to all nodes of the spark cluster?

qian...@gmail.com

unread,
Mar 26, 2015, 11:35:15 PM3/26/15
to spark-connector-user
Hi Helena:
   No, we did not do that. I used sbt assembly to build all the jars to one assembly. And I was using --jars parameter of spark-shell.
   Is this a problem?


Helena Edelson

unread,
Mar 27, 2015, 12:30:56 PM3/27/15
to spark-conn...@lists.datastax.com
Hi Chen,

I have replicated this behavior in the past in the REPL when the wrong classloader is used.

Helena
@helenaedelson

On Thursday, March 26, 2015 at 11:35:15 PM UTC-4, chen qiang wrote:
> Hi Helena:
>    No, we did not do that. I used sbt assembly to build all the jars to one assembly. And I was using --jars parameter of spark-shell.
>    Is this a problem?
>
>
>
>
>
>

Russell Spitzer

unread,
Mar 27, 2015, 12:34:22 PM3/27/15
to spark-conn...@lists.datastax.com
Try using -- 
 spark.executor.extraClassPath  spark-cassandra-connector/spark-cassandra-connector/target/scala-{binary.version}/spark-cassandra-connector-assembly-$CurrentVersion-SNAPSHOT.jar
If --jars does not work

chen qiang

unread,
Mar 28, 2015, 7:05:00 AM3/28/15
to spark-conn...@lists.datastax.com
what is a wrong classloader? can you be more specific? thanks.

chen qiang

unread,
Mar 28, 2015, 7:06:42 AM3/28/15
to spark-conn...@lists.datastax.com

thanks Russell, does this need to deploy my assembly.jar to all nodes of spark?

chen qiang

unread,
Jun 30, 2015, 10:06:09 AM6/30/15
to spark-conn...@lists.datastax.com
I have tried sc.addJar, conf.setJars, spark.executor.extraClassPath and got no luck. I'm looking at jira of apache spark, found one problem just like this one. spark-6299, but some one said that this problem does not exists on spark 1.1. I'm confusing. And this bug talked about classloader just like what Helena said. Any more advice? I'm so tied of building a jar and submiting it to spark, even for a tiny little change of my code!!

Jan Algermissen

unread,
Aug 25, 2015, 3:38:39 AM8/25/15
to DataStax Spark Connector for Apache Cassandra
I have exactly the same issue.

Seems to be cause by mapping over the CassandraRow:

I am using spark 1.4 M1 with the Cassandra Connector and run into a strange error when using the spark shell.

This works:

sc.cassandraTable("events", "bid_events").select("bid","type").take(10).foreach(println)


But as soon as I put a map() in there (or filter):

sc.cassandraTable("events", "bid_events").select("bid","type").map(r => r).take(10).foreach(println)


I get the exception below.

The spark-shell call is:

/opt/spark/bin/spark-shell --master spark://xxxxx-1:7077 --conf spark.cassandra.connection.host=$(hostname -i) --driver-class-path $(echo /root/*.jar |sed 's/ /:/g') --jar spark-cassandra-connector-assembly-1.4.0-M1-SNAPSHOT.jar

Can anyone provide ideas how to approach debugging this?

Jan


15/08/24 23:54:43 INFO DAGScheduler: Job 0 failed: take at <console>:32, took 1.999875 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 10.31.39.116): java.lang.ClassNotFoundException: $anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:66)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInp

Russell Spitzer

unread,
Aug 25, 2015, 11:34:16 AM8/25/15
to DataStax Spark Connector for Apache Cassandra
I'll take a look at this today, (note) you shouldn't have to put anything on the driver now, --jars spark-cassandra-connector-assembly-1.4.0-M1-SNAPSHOT.jar should satisfy connector requirements

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--

Russell Spitzer

unread,
Aug 25, 2015, 7:54:01 PM8/25/15
to DataStax Spark Connector for Apache Cassandra
I'm unable to repo this on spark 1.4.1 on SCC master or on SCC b1.4. Can you give any more details about your setup when you experience this?
--

Jan Algermissen

unread,
Aug 26, 2015, 5:42:55 AM8/26/15
to spark-conn...@lists.datastax.com

On 26 Aug 2015, at 01:53, Russell Spitzer <rus...@datastax.com> wrote:

> I'm unable to repo this on spark 1.4.1 on SCC master or on SCC b1.4. Can you give any more details about your setup when you experience this?
>

Spark is 1.4.1, I tried with SCC 1.4.0-M1 and 1.4.0-M3 (same effect)

What else would help?

Jan

weiqiang chen

unread,
Aug 26, 2015, 6:25:04 AM8/26/15
to DataStax Spark Connector for Apache Cassandra
this will only happens in spark  shell

--- 原始邮件 ---

发件人: "Russell Spitzer" <rus...@datastax.com>
已发: 2015年8月26日 07:54
收件人: "DataStax Spark Connector for Apache Cassandra" <spark-conn...@lists.datastax.com>
主题: Re: Re: CassandraRdd.map( row => row.getInt("id)) does not work , java.lang.ClassNotFoundException happened!

I'm unable to repo this on spark 1.4.1 on SCC master or on SCC b1.4. Can you give any more details about your setup when you experience this?

Russell Spitzer

unread,
Aug 26, 2015, 10:46:21 AM8/26/15
to DataStax Spark Connector for Apache Cassandra

Have you tried only using --jars? I wasn't able to repo in the shell

Jan Algermissen

unread,
Aug 26, 2015, 1:11:25 PM8/26/15
to spark-conn...@lists.datastax.com

On 26 Aug 2015, at 16:46, Russell Spitzer <rus...@datastax.com> wrote:

> Have you tried only using --jars? I wasn't able to repo in the shell

Using only --jars causes the import statements for the table functions to fail

Jan

Russell Spitzer

unread,
Aug 26, 2015, 1:18:42 PM8/26/15
to spark-conn...@lists.datastax.com
On spark 1.4.1? I just tried this a few moments ago only using --jars. For the scala shell this should be sufficient for getting everything on both the driver path and the executors.

For pyspark I know there is a bug though where you need both.
--

Jan Algermissen

unread,
Aug 26, 2015, 1:41:19 PM8/26/15
to spark-conn...@lists.datastax.com

On 26 Aug 2015, at 19:18, Russell Spitzer <rus...@datastax.com> wrote:

> On spark 1.4.1? I just tried this a few moments ago only using --jars. For the scala shell this should be sufficient for getting everything on both the driver path and the executors.
>

Strange.

/opt/spark/bin/spark-shell --version gives me 1.4.1

Can you probably send the commandline you are using?

Jan

Russell Spitzer

unread,
Aug 26, 2015, 2:34:46 PM8/26/15
to spark-conn...@lists.datastax.com
./bin/spark-shell --master  spark://Russells-MacBook-Pro-2.local:7077 --jars ~/repos/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.4.0-M3-SNAPSHOT.jar --conf spark.driver.host=127.0.0.1

Jan Algermissen

unread,
Aug 26, 2015, 6:54:08 PM8/26/15
to spark-conn...@lists.datastax.com
Only diference I see is that I am using the 2.11 build.

Have you checked that?

Spark is custom-built in my setup - IIRC I told operations to compile for 2.11, but meybe that slipped. Should/can I check the scala version of the running cluster?

Jan

Etienne Couritas

unread,
Aug 27, 2015, 3:45:56 AM8/27/15
to DataStax Spark Connector for Apache Cassandra
If you are bored to type the --jar params use this plugins https://github.com/crakjie/sbt-spark-plugin. You just have to set your sparkHome directory and execute sbt package submit to have your jars sended.

weiqiang chen

unread,
Aug 27, 2015, 5:55:01 AM8/27/15
to spark-conn...@lists.datastax.com
guys, what I saw is like this, using --jars or driverpath, this isnot the problem.

Problem is, if you ever write a anonymous function depends on SCC and used with map function like this:

table.map(_.getString(0)).count

There will be a exception saying anonfunc is not found.

I think it's spark to blame. Am I right?


--- 原始邮件 ---

发件人: "Russell Spitzer" <rus...@datastax.com>
已发: 2015年8月27日 02:34
收件人: spark-conn...@lists.datastax.com
主题: Re: CassandraRdd.map( row => row.getInt("id)) does not work , java.lang.ClassNotFoundException happened!

Russell Spitzer

unread,
Aug 27, 2015, 12:21:49 PM8/27/15
to spark-conn...@lists.datastax.com
@ weiqiang I'm having no problems passing any kinds of lambdas on Spark 2.10 or 2.11

@Jan I built spark 2.11 and I am noticing that --jars doesn't work like it does on 2.10. I'm taking a closer look now
--

Russell Spitzer

unread,
Aug 27, 2015, 12:26:29 PM8/27/15
to spark-conn...@lists.datastax.com

Russell Spitzer

unread,
Aug 27, 2015, 12:34:18 PM8/27/15
to spark-conn...@lists.datastax.com
For a workaround you can always manually place the jar on all of the remote executors and then add it to their classpaths with

--driver sparkconnector-assembly.jar
--conf spark.executor.extraClassPath sparkconnector-assembly.jar

Or of course run with Scala 2.10 :)



--

chen qiang

unread,
Sep 1, 2015, 4:47:39 AM9/1/15
to DataStax Spark Connector for Apache Cassandra
Hi Russell:
I think I'm running into a corner.
I double checked my code, and I donot think scala 2.11 is my problem. Here are the configurations of my spark setup. and the jar file I used with --jars. I'm still facing the anonymous function problem. I've done these things already and the problem is still the same.
1. I'm using scala 2.10
2. I'm using SCC-1.3.0, Oracle JDK 1.7.1-x64, Cassandra 2.1.8
3. I tried --jars and spark.executor.extraClassPath and both
4. I tried --master local[2]
My command line is :
bin/spark-shell --jars /opt/spark/current_userlog/source/connector-source-assembly-1.3.jar

then type this:
scala>:load source.scala
scala>t.filter(_.getString("xxx") == "aaa").count

Here is the error message:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 59, 10.129.20.85): java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)


at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

spark-defaults.conf
spark-env.sh
build.sbt
source.scala

Russell Spitzer

unread,
Sep 1, 2015, 12:13:14 PM9/1/15
to DataStax Spark Connector for Apache Cassandra
I'm pretty sure the main problem here is that you start a context with --jars and then kill that context and then start another one. Try simplifying your code, instead of setting all of those spark conf options and creating a new contexts run your shell like. Also the jar that you want on the classpath is the connector assembly jar, not a custom build of a scala script you want to run. 

./spark-shell --conf spark.casandra.connection.host=10.129.20.80 ... 

You should not need to modify the ack.wait.timeout or the executor.extraClasspath.

Have you tried (without all of the context modification) just running, and not via the :load command?

val t = sc.cassandraTable("alldata", "data").where("appenddatetime > '2015-09-01 12:00:00'")



To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--

chen qiang

unread,
Sep 2, 2015, 3:37:03 AM9/2/15
to DataStax Spark Connector for Apache Cassandra
Problem solved. Thanks Russell, you're right. The reason is I stop the generated sc. I add all the conf.set() lines to spark-defaults.conf, and it works!

It takes me nealy 6 months to solve this. thanks again.

Russell Spitzer

unread,
Sep 2, 2015, 11:15:27 AM9/2/15
to DataStax Spark Connector for Apache Cassandra
I'm glad you got to the end of it! :)

On Wed, Sep 2, 2015 at 12:37 AM chen qiang <qian...@gmail.com> wrote:
Problem solved. Thanks Russell, you're right. The reason is I stop the generated sc. I add all the conf.set() lines to spark-defaults.conf, and it works!

It takes me nealy 6 months to solve this. thanks again.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--

Mohamadreza.r

unread,
Aug 29, 2017, 8:02:34 AM8/29/17
to DataStax Spark Connector for Apache Cassandra, qian...@gmail.com
در چهارشنبه 2 سپتامبر 2015، ساعت 12:07:03 (UTC+4:30)، chen qiang نوشته:
> Problem solved. Thanks Russell, you're right. The reason is I stop the generated sc. I add all the conf.set() lines to spark-defaults.conf, and it works!
>
> It takes me nealy 6 months to solve this. thanks again.

hi
i have same problem when connect from java api(java client).how set this config ?
(i don't use spark-shell )
Reply all
Reply to author
Forward
0 new messages