spark.executor.extraClassPath - Values not picked up by executors

Todd Nist

unread,

May 22, 2015, 6:45:21 PM5/22/15

to spark-conn...@lists.datastax.com

I posted this to the Spark user group, but posting here in case it is related to the connector, doubt it.

I'm using the spark-cassandra-connector from DataStax in a spark streaming job launched from my own driver based on the Killrweather reference application. It is connecting a a standalone cluster on my local box which has two worker running.

This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I modified the dependencies in KillerWeather to use spark-1.3.1 and built the snapshot of the connector.

I have added the following entry to my $SPARK_HOME/conf/spark-default.conf:

spark.executor.extraClassPath /projects/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar

When I start the master with, $SPARK_HOME/sbin/start-master.sh, it comes up just fine. As do the two workers with the following command:

Worker 1, port 8081:

radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://radtech.io:7077 --webui-port 8081 --cores 2

Worker 2, port 8082

radtech:spark $ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://radtech.io:7077 --webui-port 8082 --cores 2

When I execute the Driver connecting the the master:

sbt app/run -Dspark.master=spark://radtech.io:7077

It starts up, but when the executors are launched they do not include the entry in the spark.executor.extraClassPath:

15/05/22 17:35:26 INFO Worker: Asked to launch executor app-20150522173526-0000/0 for KillrWeatherApp$
15/05/22 17:35:26 INFO ExecutorRunner: Launch command: "java" "-cp" "/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar" "-Dspark.driver.port=55932" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://spark...@192.168.1.3:55932/user/CoarseGrainedScheduler" "--executor-id" "0" "--hostname" "192.168.1.3" "--cores" "2" "--app-id" "app-20150522173526-0000" "--worker-url" "akka.tcp://spark...@192.168.1.3:55923/user/Worker"

which will then cause the executor to fail with a ClassNotFoundException, which I would expect:

[WARN] [2015-05-22 17:38:18,035] [org.apache.spark.scheduler.TaskSetManager]: Lost task 0.0 in stage 2.0 (TID 23, 192.168.1.3): java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I also notice that some of the entires on the executor classpath are duplicated? This is a newly installed spark-1.3.1-bin-hadoop2.6 standalone cluster just to ensure I had nothing from testing in the way.

I can set the SPARK_CLASSPATH in the $SPARK_HOME/spark-env.sh and it will pick up the jar and append it fine.

Any suggestions on what is going on here? Seems to just ignore whatever I have in the spark.executor.extraClassPath. Is there a different way to do this?

TIA.

-Todd

Helena Edelson

unread,

May 26, 2015, 8:23:46 AM5/26/15

to spark-conn...@lists.datastax.com

Hi Todd,

You may just need to add the jar to SparkConf.setJars() in the app, or it could be a mis-configuration in your build changes

I have not pushed the latest version upgrades to KillrWeather because we have not yet released the connector version supporting spark 1.3. This is coming soon. I’ll be updating the repo before heading to talk at ScalaDays Amsterdam in 2 weeks.

HELENA EDELSON

Senior Software Engineer, Analytics

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Todd Nist

unread,

May 26, 2015, 7:01:09 PM5/26/15

to spark-conn...@lists.datastax.com

Ok, so I answered my own question I believe :). I just did the following:

lazy val conf = new SparkConf().setAppName(getClass.getSimpleName)
    .setMaster(SparkMaster)
    .set("spark.cassandra.connection.host", CassandraHosts)
    .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    .set("spark.kryo.registrator", "com.datastax.killrweather.KillrKryoRegistrator")
    .set("spark.cleaner.ttl", SparkCleanerTtl.toString)
    .setJars(        Array("/projects/radtech.io/killrweather/killrweather-app/target/scala-2.10/app_2.10-1.0.0-SNAPSHOT.jar"))
//.setJars(SparkContext.jarOfClass(this.getClass).toSeq)

lazy val sc = new SparkContext(conf)
sc.addJar("/projects/radtech.io/spark-cassandra-connector/spark-connector/target/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar")
  sc.addJar("/projects/radtech.io/killrweather/killrweather-core/core_2.10-1.0.0-SNAPSHOT.jar")

So is there a "best" practice for adding dependencies like this? In the past I have used .set("spark.executor.extraClassPath", "/common/file/share/here/*") and pushed all dependencies there as part of the build. This works and ensures that the code is in sync with current dependencies, but I'm curious how others handle this.

In this case there are only two addJar required, but as the number dependencies increases how are others ensuring that the entries in "addJar" are complete?

Thanks for the earlier assistance and look forward to hearing how others handle this.

-Todd

On Tue, May 26, 2015 at 4:58 PM, Todd Nist <tsin...@gmail.com> wrote:

Hi Helena,

Thanks for the feedback. I have added the jar with setJars, but that still results in the same error which sort of makes sense when I think about it. Would this not have to be a fatjar, assembly added, for it to work? If I simply add the generated jar from sbt package, it will not contain any of the dependencies such as the spark-cassandra-connector, datastax-driver, ... which the executors may be dependent on; am I missing something here?

I can think of ways to achieve this by adding:

.set("spark.executor.extraClassPath", "/common/file/share/here/*")
Or use addJar for each of the required jars, easy enough to do using something like the sbt native packager and then just loading all jars from the lib directory, or the appropriate ones something like:
jars = listJars(“./lib/*.jar”).map(_.filter(_.size != 0)).toSeq.flatten
if (jars ! = null) {
jars.foreach(sparkContext.addJar)
}
I have the first one working, but it needs to be automated and cleaned up.
So am I missing something obvious here? Is there a easier or cleaner solution that I'm missing? I don't think there is but wanted to validate.
-Todd

Todd Nist

unread,

May 26, 2015, 7:56:51 PM5/26/15

to spark-conn...@lists.datastax.com

Hi Helena,

Thanks for the feedback. I have added the jar with setJars, but that still results in the same error which sort of makes sense when I think about it. Would this not have to be a fatjar, assembly added, for it to work? If I simply add the generated jar from sbt package, it will not contain any of the dependencies such as the spark-cassandra-connector, datastax-driver, ... which the executors may be dependent on; am I missing something here?

I can think of ways to achieve this by adding:

.set("spark.executor.extraClassPath", "/common/file/share/here/*")

Or use addJar for each of the required jars, easy enough to do using something like the sbt native packager and then just loading all jars from the lib directory, or the appropriate ones something like:

jars = listJars(“./lib/*.jar”).map(_.filter(_.size != 0)).toSeq.flatten
if (jars ! = null) {
jars.foreach(sparkContext.addJar)
}

I have the first one working, but it needs to be automated and cleaned up.

So am I missing something obvious here? Is there a easier or cleaner solution that I'm missing? I don't think there is but wanted to validate.

-Todd

On Tue, May 26, 2015 at 8:23 AM, Helena Edelson <helena....@datastax.com> wrote:

Jose Antonio Omedes Capdevila

unread,

May 26, 2015, 7:56:51 PM5/26/15

to spark-conn...@lists.datastax.com

Hello,

I had a similar issue with Spark 1.2.2 and 1.2.0-rc3 version of the connector. At the end I was loading the connector not through spark-default.conf but on the command line when issuing the spark-submit command.

See the entire discussion on:

http://stackoverflow.com/questions/29972880/access-cassandra-from-spark-com-esotericsoftware-kryo-kryoexception-unable-to

I was having issues with a different class, but got solved using the class loader of the submit command

./bin/spark-shell 
    --master spark://ubuntu:7077 --driver-class-path /home/automaton/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar --conf spark.executor.extraClassPath=/home/automaton/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar  -conf spark.cassandra.connection.host=127.0.0.1

Jose

Mohammed Guller

unread,

May 26, 2015, 8:55:37 PM5/26/15

to spark-conn...@lists.datastax.com

Hi Todd,

Any reason why you are not building a fat jar for the Spark Cassandra connector?

Mohammed

From: spark-conn...@lists.datastax.com [mailto:spark-conn...@lists.datastax.com] On Behalf Of Todd Nist
Sent: Tuesday, May 26, 2015 1:58 PM
To: spark-conn...@lists.datastax.com
Subject: Re: spark.executor.extraClassPath - Values not picked up by executors

Hi Helena,

Thanks for the feedback. I have added the jar with setJars, but that still results in the same error which sort of makes sense when I think about it. Would this not have to be a fatjar, assembly added, for it to work? If I simply add the generated jar from sbt package, it will not contain any of the dependencies such as the spark-cassandra-connector, datastax-driver, ... which the executors may be dependent on; am I missing something here?

I can think of ways to achieve this by adding:

.set("spark.executor.extraClassPath", "/common/file/share/here/*")

Or use addJar for each of the required jars, easy enough to do using something like the sbt native packager and then just loading all jars from the lib directory, or the appropriate ones something like:

jars = listJars(“./lib/*.jar”).map(_.filter(_.size != 0)).toSeq.flatten
if (jars ! = null) {
jars.foreach(sparkContext.addJar)
}

I have the first one working, but it needs to be automated and cleaned up.

So am I missing something obvious here? Is there a easier or cleaner solution that I'm missing? I don't think there is but wanted to validate.

-Todd

On Tue, May 26, 2015 at 8:23 AM, Helena Edelson <helena....@datastax.com> wrote:

Hi Todd,

You may just need to add the jar to SparkConf.setJars() in the app, or it could be a mis-configuration in your build changes

I have not pushed the latest version upgrades to KillrWeather because we have not yet released the connector version supporting spark 1.3. This is coming soon. I’ll be updating the repo before heading to talk at ScalaDays Amsterdam in 2 weeks.

HELENA EDELSON

Senior Software Engineer, Analytics

On May 22, 2015, at 6:45 PM, Todd Nist <tsin...@gmail.com> wrote:

Todd Nist

unread,

May 27, 2015, 8:06:49 AM5/27/15

to spark-conn...@lists.datastax.com

Hi Mohammed,

Thanks for the feedback, appreciate it.

I have built a fatjar for the spark-cassandra-connector. The issue is with building the Killrweather reference app and having it connect to a remote spark cluster rather than local[*]; it is the driver and is not going through spark-submit.

I have it working for now. Building a fatjar for Killrweather does not seem right as it would end up needing to include Spark, akka, and several other dependencies which could cause conflicts when running on the spark cluster.

So it works with addJar(...) or set("spark.executor.extraClassPath", "/common/file/share/here/*") so I think I'm good for now, but am open to other alternatives.

Again, thanks for the input.

-Todd

Sergey Vasilyev

unread,

Jul 24, 2015, 4:43:52 PM7/24/15

to DataStax Spark Connector for Apache Cassandra, tsin...@gmail.com

Hi,

Master, workers(executors) and driver are different processes, and have its own classpathes.

The spark.executor.extraClassPath is actually taken by an executor only, but not a driver. It means that each executor will have the connector jar in its classpath, but not in the driver's classpath. To add a jar to a driver's class path it is offered to use --driver-class-path command line options or set the spark.driver.extraClassPath in spark-defaults.config on the driver's machine.

Reply all

Reply to author

Forward