adding jars to spark context

10,049 views
Skip to first unread message

Stephen Haberman

unread,
Jan 9, 2013, 5:10:40 PM1/9/13
to spark...@googlegroups.com
Hi,

I have a Spark standalone cluster in EC2; I have a jar with code I'd
like to use in REPL session, so I log into the master, run spark-shell,
then, since I've downloaded the jar locally to the master, call:

scala> sc.addJar("file:///...jar")
13/01/09 22:05:33 INFO spark.SparkContext: Added JAR
file:///home/hadoop/....jar at http://10.70.7.20:55974/jars/...jar with
timestamp 1357769133587

But now I cannot import the code from the jar:

scala> import com.foo._
<console>:10: error: object foo is not a member of package com

Instead, I've put my application's jar in spark/lib_managed/jars, which
seems somewhat hacky, but then spark-shell can load it; am I missing a
better way?

- Stephen

Matei Zaharia

unread,
Jan 9, 2013, 5:13:25 PM1/9/13
to spark...@googlegroups.com
Ah, addJar doesn't add it into the classpath of the master JVM (in this case spark-shell) -- it assumes that those classes are already there, and it just adds a JAR to send to worker nodes. I guess we could change it to add the JAR in spark-shell, but otherwise, you can either add it to the lib directory as you said, or add it to the SPARK_CLASSPATH environment variable before launching spark-shell.

Matei

Stephen Haberman

unread,
Jan 9, 2013, 5:51:01 PM1/9/13
to spark...@googlegroups.com

> Ah, addJar doesn't add it into the classpath of the master JVM (in
> this case spark-shell)

Cool, makes sense. Something like this? (Feel free to comment on the
pull request instead.)

https://github.com/mesos/spark/pull/359

It seems to work locally for me, although I have admittedly not spun it up on a
cluster yet.

So, if running in a cluster, and a shell connects to the master, we can
add the jar to the workers and the shell, but the master itself won't
need it? (I'm still wrapping my head around Spark's execution model.)

Thanks for the quick response.

- Stephen

Rajiv Abraham

unread,
Apr 16, 2013, 4:51:42 PM4/16/13
to spark...@googlegroups.com
Hi Guys,

I looked at the pull request but I confess to be a bit lost. 

Could you please let us know how one can add project specific libraries/jars to the spark shell so that I can test snippets of my code within the spark-shell?
For example, I want to test how fastutil libraries(as suggested in the tuning guide) work with RDD's and their memory usage, using the spark shell.


I was wondering if I could run the spark-shell within a Scala project and import my project specific libraries directly

Warm Regards,
Rajiv

Mark Hamstra

unread,
Apr 16, 2013, 5:11:33 PM4/16/13
to spark...@googlegroups.com
If you were running standalone code, then you could add a jar to the SparkContext with SparkContext.addJar.  However, that doesn't work correctly from within the spark-shell.  To add jars to the spark-shell, your best option is to patch the code and then to specify the needed jars as an environment variable before starting spark-shell.
Reply all
Reply to author
Forward
0 new messages