Trouble deploying Java Standalone Spark Job

Scott Langevin

unread,

May 17, 2013, 3:01:10 PM5/17/13

to spark...@googlegroups.com

I've been trying to figure out how to write a standalone spark job and deploy it on a spark cluster we having running on mesos. Our spark installation is functional - we can connect to spark-shell and run jobs interactively, but I'm trying to build a standalone job we can deploy.

I have followed the quick start instructions on how to use Maven for spark dependencies in a Java eclipse project: http://spark-project.org/docs/latest/quick-start.html

My eclipse project is pretty simple, it's just a Test class with a main(), which create a JavaSparkContext and does a few simple map-reduce operations on a text file. My JavaSparkContext looks like this:

JavaSparkContext sc = new JavaSparkContext("mesos://master:5050", "TEST", "/opt/spark-0.7.0", "SparkTest-0.0.1-SNAPSHOT.jar");

Where I'm stuck is how to actually deploy this to the cluster. I'm using maven to create the jar file (SparkTest-0.0.1-SNAPSHOT.jar), which I tried copying to the spark master node. I tried to execute Test.main() using:

java -cp SparkTest-0.0.1-SNAPSHOT.jar sparktest.Test

But I get the following exception:

Exception in thread "main" java.lang.NoClassDefFoundError: spark/api/java/function/FlatMapFunction

Caused by: java.lang.ClassNotFoundException: spark.api.java.function.FlatMapFunction

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

Could not find the main class: com.oculus.spark.Test. Program will exit.

I also tried building a jar file with all the dependencies included but that was giving an Akka configuration exception. I found others on this mailing list that had a similar problem and they solved it by not bundling all the dependencies with the jar.

So does anyone know how to actually deploy a Java spark job? What is the best practice?

Thanks!

Scott

Josh Rosen

unread,

May 17, 2013, 3:47:34 PM5/17/13

to spark...@googlegroups.com

It looks like Spark's files aren't present on the classpath.

(Disclaimer: I haven't actually tested any of the following, so I could be wrong)

Try running

SPARK_CLASSPATH=SparkTest-0.0.1-SNAPSHOT.jar $SPARK_HOME/run spartest.Test

This will use the Spark `run` script to add the required Spark classes to the classpath and load the settings from spark-env.sh.

I don't think that you should bundle the Spark library in your JAR. In general, Spark releases are API-compatible with each other but not binary-compatible: for example, you can't connect to a cluster running Spark 0.6.0 from a client application running against the Spark 0.6.1 JAR. What I'd do is to mark Spark as a "provided" dependency in whatever build system you're using, then use Spark's `run` script (or your own custom environment setup script) to add the cluster's Spark JARs to the classpath. This will ensure that your code runs against the version of Spark that's actually installed on your cluster.

I don't think that we have a list of best practices for deploying Spark jobs to clusters; that would be a useful addition to our documentation.

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Scott Langevin

unread,

May 17, 2013, 7:11:41 PM5/17/13

to spark...@googlegroups.com

Thanks Josh!

That seems to have done the trick. I can now run my job against our cluster.

Scott

Ravi Hemnani

unread,

Dec 12, 2013, 7:03:21 AM12/12/13

to spark...@googlegroups.com

@Scott Langevin: Can you tell me what is "sparktest.Test" ?

Because i am trying my hands on spark and running examples again my cluster that i created but i am getting the same error. How did you solve the issue? Where did you modify the SPARK_CLASSPATH?

Max

unread,

Dec 27, 2013, 4:33:42 PM12/27/13

to spark...@googlegroups.com

What is the difference between including my app (dependencies) jars in SPARK_CLASSPATH and putting the jars in the "jars" parameter (Seq) in SparkContext? In my case, SPARK_CLASSPATH works fine. Btut the latter way turns out runtime exception. When I checked on slaves, the jars are shipped and loaded on workers, but the runtime exception says something is not found.