Need suggestion to move Spark jars out of application jar

269 views
Skip to first unread message

liu

unread,
Feb 6, 2013, 5:55:59 PM2/6/13
to spark...@googlegroups.com
Hi,

I have build the entire application into one jar, and distributed the jar to spark in standalone mode and works. But the jar is 50m, and it takes worker node 9 seconds to download the jar file. Is there a way I deploy the jar to worker node manually and spark can find it? I tried the following:
1. build spark related classes into a separate jar, e.g. spark-support.jar
2. set SPARK_CLASSPATH in conf/spark-env.sh and point to the spark-support.jar
3. build application classes into an independent jar, e.g. app.jar
4. when instantiate JavaSparkContext, I provided the path of app.jar as a parameter.

When I start to run the app, it throws the following exception:
Caused by: java.lang.ClassNotFoundException: spark.api.java.function.FlatMapFunction
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        ... 22 more
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6.541s
[INFO] Finished at: Wed Feb 06 14:24:44 PST 2013
[INFO] Final Memory: 9M/481M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (default-cli) on project gdv-run: An exception occured while executing the Java class. null: InvocationTargetException: spark/api/java/function/FlatMapFunction: spark.api.java.function.FlatMapFunction -> [Help 1]

It cannot find the spark-support.jar, could someone give me some suggestions where I should put the jar and what environment variables I should set?

Thank you

Matei Zaharia

unread,
Feb 6, 2013, 6:10:09 PM2/6/13
to spark...@googlegroups.com
Spark will already have its own code on the classpath on workers, so you should only need to provide your classes in a JAR. That is just packaging your app and passing that app.jar, without setting SPARK_CLASSPATH, should work.

Matei


--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

liu

unread,
Feb 6, 2013, 6:21:09 PM2/6/13
to spark...@googlegroups.com
Thank you for your reply first. I made the following changes:
1. When build the application, I modified the scope of spark-core_2.9.2 as provided.
2. remove SPARK_CLASSPATH from conf/spark-env.sh
3. run bin/start-all.sh to restart spark server as standalone mode

I still have the same exception. 

Thank you,

liu

unread,
Feb 6, 2013, 7:01:14 PM2/6/13
to spark...@googlegroups.com
Sorry the above issues happened when I tested on my laptop. But when I tested on real cluster, I got new error:
Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.remote.log-received-messages'
        at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:126)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:146)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:119)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:157)
        at com.typesafe.config.impl.SimpleConfig.getBoolean(SimpleConfig.java:167)

I use maven to build the app into one jar (jar-with-dependencies). 

Could someone give me some suggestions?

Thank you

Matei Zaharia

unread,
Feb 6, 2013, 7:11:58 PM2/6/13
to spark...@googlegroups.com
Oh, I see, it might not work with the one-jar plugin. What I was talking about is just mvn package, which creates a JAR with just your classes.

This particular error is because some Akka libraries are missing. I think listing Spark as "provided" is a problem for that -- it would be better to configure the one-jar plugin to skip it somehow.

Matei

liu

unread,
Feb 6, 2013, 7:24:41 PM2/6/13
to spark...@googlegroups.com
Thank you, I'll try the other way to package the jar.

liu

unread,
Feb 7, 2013, 4:07:12 PM2/7/13
to spark...@googlegroups.com
updates:
I excluded spark and its dependencies from the final jar. But when I run it in the cluster (multiple machines), it still throws exception:
Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.remote.log-received-messages'
        at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:126)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:146)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:119)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:157)
 
If I try to run in my laptop, url is spark://0.0.0.0:7077, it throws exceptions that cannot find some Spring classes which is in the jar.

BTW, I'm using the following descriptor to build the jar (manually listed all dependencies I need, maybe there is some other good ways to do it):
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<useProjectArtifact>true</useProjectArtifact>
<unpack>true</unpack>
<useTransitiveDependencies>false</useTransitiveDependencies>
<excludes>
<exclude>org.spark-project:spark-core_2.9.2</exclude>
</excludes>
<includes>
<include>org.springframework.data:*</include>
<include>org.springframework:*</include>
<include>com.gdv:*</include>
<include>commons-*:*</include>
</includes>
</dependencySet>
</dependencySets>

Thank you,

liu

unread,
Feb 7, 2013, 5:20:43 PM2/7/13
to spark...@googlegroups.com
Sorry, (correction) when I run in laptop:
Caused by: java.lang.ClassNotFoundException: spark.api.java.function.FlatMapFunction
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

It cannot find the Spark function classes.

Could someone give me some suggestions?
Thank you,

liu

unread,
Feb 7, 2013, 5:24:22 PM2/7/13
to spark...@googlegroups.com
I think I know what's problem for the problem running in laptop, will fix it and try on cluster.
Thanks

liu

unread,
Feb 8, 2013, 2:56:49 AM2/8/13
to spark...@googlegroups.com
OK, solved the issue, just simply not build big fat jar.
Thanks
Reply all
Reply to author
Forward
0 new messages