Hi,
I have build the entire application into one jar, and distributed the jar to spark in standalone mode and works. But the jar is 50m, and it takes worker node 9 seconds to download the jar file. Is there a way I deploy the jar to worker node manually and spark can find it? I tried the following:
1. build spark related classes into a separate jar, e.g. spark-support.jar
2. set SPARK_CLASSPATH in conf/spark-env.sh and point to the spark-support.jar
3. build application classes into an independent jar, e.g. app.jar
4. when instantiate JavaSparkContext, I provided the path of app.jar as a parameter.
When I start to run the app, it throws the following exception:
Caused by: java.lang.ClassNotFoundException: spark.api.java.function.FlatMapFunction
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 22 more
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.541s
[INFO] Finished at: Wed Feb 06 14:24:44 PST 2013
[INFO] Final Memory: 9M/481M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project gdv-run: An exception occured while executing the Java class. null: InvocationTargetException: spark/api/java/function/FlatMapFunction: spark.api.java.function.FlatMapFunction -> [Help 1]
It cannot find the spark-support.jar, could someone give me some suggestions where I should put the jar and what environment variables I should set?
Thank you