I've been trying to figure out how to write a standalone spark job and deploy it on a spark cluster we having running on mesos. Our spark installation is functional - we can connect to spark-shell and run jobs interactively, but I'm trying to build a standalone job we can deploy.
My eclipse project is pretty simple, it's just a Test class with a main(), which create a JavaSparkContext and does a few simple map-reduce operations on a text file. My JavaSparkContext looks like this:
JavaSparkContext sc = new JavaSparkContext("mesos://master:5050", "TEST", "/opt/spark-0.7.0", "SparkTest-0.0.1-SNAPSHOT.jar");
Where I'm stuck is how to actually deploy this to the cluster. I'm using maven to create the jar file (SparkTest-0.0.1-SNAPSHOT.jar), which I tried copying to the spark master node. I tried to execute Test.main() using:
java -cp SparkTest-0.0.1-SNAPSHOT.jar sparktest.Test
But I get the following exception:
Exception in thread "main" java.lang.NoClassDefFoundError: spark/api/java/function/FlatMapFunction
Caused by: java.lang.ClassNotFoundException: spark.api.java.function.FlatMapFunction
at java.security.AccessController.doPrivileged(Native Method)
Could not find the main class: com.oculus.spark.Test. Program will exit.
I also tried building a jar file with all the dependencies included but that was giving an Akka configuration exception. I found others on this mailing list that had a similar problem and they solved it by not bundling all the dependencies with the jar.
So does anyone know how to actually deploy a Java spark job? What is the best practice?