How to run Spark code from Scala IDE or Terminal?

5,056 views
Skip to first unread message

Gaurav Dasgupta

unread,
Oct 1, 2012, 4:18:25 AM10/1/12
to spark...@googlegroups.com
Hi Users,
 
Can someone brief me the approaches required to execute my own spark codes from Scala IDE or terminal. All I understood that I have to build the project with "spark-core-SNAPSHOT.jar" and hence I imported that external jar to my Scala project.
But when I try to execute the code from Scala IDE, it hangs.
 
Also if I try to run the code from command line using "scala" command, it does'nt run. It gives the error that it cannot find "spark" object which is used in the line "import spark.SparkContext". But the external spark jar is there in the working directory.
 
I am a beginner to Spark. Am I missing any JAR? Or is there any other way of executing spark codes? Please help.
 
Thanks,
Gaurav Dasgupta 

Jaka Jancar

unread,
Oct 1, 2012, 5:16:40 AM10/1/12
to spark...@googlegroups.com
A good list of best practices for packaging and running Spark-based apps would be great, e.g.:

  - I package my app with sbt-assembly: should I include Spark's assembly in mine or not? Previously I thought yes, but you'll need a separate Spark checkout on Mesos anyways, so maybe not.

  - Since there's no Maven/Ivy repo (and if you're using dev versions) where you could specify "provided" scope, you have to copy Spark's assembly to lib/ and mess a bit with SBT to exclude it from your own uber-jar.

  - How do you run it afterwards? With spark/run, somehow else?

...

Jaka Jancar

unread,
Oct 1, 2012, 5:31:41 AM10/1/12
to spark...@googlegroups.com
Another example: you can set SPARK_HOME in at least 3 ways. That's great, but what is the use for each of those? Which one should you use?

Gaurav Dasgupta

unread,
Oct 1, 2012, 7:49:06 AM10/1/12
to spark...@googlegroups.com
Here are my paths:

MESOS_HOME=/usr/local/mesos
SPARK_HOME=/usr/local/mesos/spark
SCALA_HOME=/usr/local/mesos/scala

My spark_env.sh file contains these:

#!/usr/bin/env bash
# Set Spark environment variables for your site in this file. Some useful
# variables to set are:
export MESOS_HOME=/usr/local/mesos
# - MESOS_NATIVE_LIBRARY, to point to your Mesos native library (libmesos.so)
export MESOS_NATIVE_LIBRARY=/usr/local/mesos/src/.libs/libmesos.so
# - SCALA_HOME, to point to your Scala installation
export SCALA_HOME=/usr/local/mesos/scala-2.9.2
# - SPARK_CLASSPATH, to add elements to Spark's classpath
# - SPARK_JAVA_OPTS, to add JVM options
# - SPARK_MEM, to change the amount of memory used per node (this should
#   be in the same format as the JVM's -Xmx option, e.g. 300m or 1g).
export SPARK_MEM=200m
# - SPARK_LIBRARY_PATH, to add extra search paths for native libraries.

Can you explain me the steps that I should be following to run my spark code? I have also referred to https://github.com/mesos/spark/wiki/Running-Spark-Demo-Guide
But that is also not working for me.

My code is:

import spark.SparkContext
import SparkContext._

object SparkTest {
  def main(args: Array[String]) {
    if (args.length == 0) {
      System.err.println("Usage: SparkTest <host> [<slices>]")
      System.exit(1)
    }
    val spark = new SparkContext(args(0), "SparkTest")
    val slices = if (args.length > 1) args(1).toInt else 2
    val myFile = spark.textFile("/opt/test.txt")
    val counts = myFile.flatMap(line => line.split(" "))
                        .map(word => (word, 1))
                        .reduceByKey(_ + _)

    counts.saveAsTextFile("/opt/out2.txt")
    System.exit(0)
  }
}

By whatever means I have tried till now, I am unable to run this code as it gives error: "value spark not found".
If I have copied it to the examples directory in SPARK_HOME and tried to execute using the "./run" command (for which I had included the "package spark.examples" in the code), it did'nt run either.

Please help me here.

Matei Zaharia

unread,
Oct 1, 2012, 12:50:24 PM10/1/12
to spark...@googlegroups.com
The "value spark not found" means that you haven't included Spark on your classpath at runtime. The easiest way to do that is to build it into a JAR. Try the following:

- Go into Spark and do sbt/sbt assembly. This builds core/target/spark-*-assembly.jar, which contains Spark and all its dependencies.

- Now, in the folder with your program, run

scalac -classpath spark-*-assembly.jar SparkTest.scala

- Now, run

scala -classpath spark-*-assembly.jar SparkTest


Matei

Matei Zaharia

unread,
Oct 1, 2012, 12:52:26 PM10/1/12
to spark...@googlegroups.com
I'm assuming you want to run Spark applications from an IDE, not to build Spark itself in one. The easiest way then is to do sbt/sbt assembly in Spark to create a spark-*-assembly.jar, which contains all of Spark and all of its dependencies. Then just add this to your IDE project the same way you would add any JAR.

From an IDE you should easily be able to run programs in "local" mode. To run on a cluster you will need to package your code into a JAR and pass that to SparkContext, as described in https://github.com/mesos/spark/wiki/Spark-Programming-Guide.

Matei

Matei Zaharia

unread,
Oct 1, 2012, 12:58:03 PM10/1/12
to spark...@googlegroups.com
Good questions. The recommended way to run Spark right now is the one in the programming guide: https://github.com/mesos/spark/wiki/Spark-Programming-Guide under "Linking with Spark". There are actually two ways to add a dependency: either do sbt assembly, which builds a big JAR, or do sbt publish-local, which publishes Spark to your local Ivy directory. In either case, you don't need to use spark/run; once you have those classes on your classpath, and you've passed your JARs and Spark home to SparkContext, you're set.

In the near future we're going to make Spark available on Maven (this has been blocking on making Mesos available there) and it will become easier to link to it without even having to do publish-local. We're also updating the docs to have a "quick start" tutorial that shows the best way to set up a Scala or Java standalone app project.

Matei

Jaka Jancar

unread,
Oct 1, 2012, 4:30:52 PM10/1/12
to spark...@googlegroups.com
I was thinking more along the lines of what is the recommended setup for a multi-user Spark cluster on Mesos (so that when you get to that point, your app is ready to go):

 1. Spark per app: Assemble Spark into the JAR of your app.
 2. Spark per user: Let each user have an external Spark (or multiple versions). When compiling, list Spark as a "provided" dependency, then run your app JARs with it, e.g. using spark/run.
 3. Spark per cluster: Have a global Spark install, for all users.

I'm leaning towards #2, though I think with Hadoop it's usually #3.

Matei Zaharia

unread,
Oct 1, 2012, 6:27:50 PM10/1/12
to spark...@googlegroups.com
I think most people have done 3, having essentially one SPARK_HOME on each of the machines. However, part of the point of Mesos is to enable 2, if you want to run two versions of Spark concurrently.

Also, just to be clear, I don't really recommend using the "run" script to run user programs. It's mostly meant to run the examples and any "main" programs Spark itself requires (e.g. spark-shell, spark-executor when running on Mesos, and the standalone master and workers). User programs should be able to just add Spark to the classpath however they prefer (assembly or publish-local) and then run like any other Java app. The script doesn't do anything special beyond setting up the classpath for our own "main" programs. This is what the docs, etc will guide people towards in the future.

Matei

Gaurav Dasgupta

unread,
Oct 2, 2012, 5:37:45 PM10/2/12
to spark...@googlegroups.com
Hi Matei,

Thanks for the reply. But the steps are not working for me. I compiled the code with the spark-core-assembly classpath and it complied without any error. But when I tried to run it, it threw error. Here it is:

[root@localhost opt]# scalac -classpath core/target/spark-core-assembly-0.5.1-SNAPSHOT.jar SparkTest.scala
[root@localhost opt]# scala -classpath core/target/spark-core-assembly-0.5.1-SNAPSHOT.jar SparkTest
java.lang.NoSuchMethodException: SparkTest.main([Ljava.lang.String;)
    at java.lang.Class.getMethod(Class.java:1622)
    at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:74)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
    at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
    at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
    at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
    at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
[root@localhost opt]#


Here is my modified code once again (I have hard coded the "localhost:5050" argument in the main class):

import spark.SparkContext
import SparkContext._

object SparkTest {
  def main() {
 
   // if (args.length == 0) {
     // System.err.println("Usage: SparkTest <host> [<slices>]")
     // System.exit(1)
   // }
 
    val sc = new SparkContext("localhost:5050", "SparkTest")
    //val slices = if (args.length > 1) args(1).toInt else 2
    val slices = 2
    val myFile = sc.textFile("/opt/test.txt")

    val counts = myFile.flatMap(line => line.split(" "))
                        .map(word => (word, 1))
                        .reduceByKey(_ + _)
 
    counts.saveAsTextFile("/opt/output.txt")
  }
}


From ScalaIDE also, I am unable to run (the required JAR is added in the project). The thing is, it compiles fine which means it can get the Spark classes in the code. But it does'nt run.
Please tell me what's going on wrong?

Thanks,
Gaurav Dasgupta

Matei Zaharia

unread,
Oct 2, 2012, 9:09:01 PM10/2/12
to spark...@googlegroups.com
You need to change def main() to def main(args: Array[String]). That's the right signature for a main() method, much like in Java. That's also what the error message is saying actually, though it's a bit confusing (it asks for SparkTest.main([Ljava.lang.String;), where [Ljava.lang.String; is the class name for String[]).

By the way, for general Scala help, I suggest checking out this free book: http://www.artima.com/pins1ed/.

Matei

Gaurav Dasgupta

unread,
Oct 3, 2012, 3:41:41 AM10/3/12
to spark...@googlegroups.com
Hi Matei,

I did what you said and now I can see that it has resolved the problem to some extent. But now, I am getting the following error:

[root@localhost opt]# scalac -classpath /usr/local/mesos/spark/core/target/spark-core-assembly-0.5.1-SNAPSHOT.jar SparkTest.scala
[root@localhost opt]# scala -classpath /usr/local/mesos/spark/core/target/spark-core-assembly-0.5.1-SNAPSHOT.jar SparkTest localhost:5050
12/10/03 13:04:56 INFO spark.BoundedMemoryCache: BoundedMemoryCache.maxBytes = 171284889
12/10/03 13:04:57 INFO spark.CacheTrackerActor: Registered actor on port 7077
12/10/03 13:04:57 INFO spark.CacheTrackerActor: Started slave cache (size 163.3MB) on localhost.localdomain
12/10/03 13:04:57 INFO spark.MapOutputTrackerActor: Registered actor on port 7077
12/10/03 13:04:57 INFO spark.ShuffleManager: Shuffle dir: /tmp/spark-local-1a2479bb-0af1-48e6-af18-2ad09fbf2671/shuffle
12/10/03 13:04:58 INFO server.Server: jetty-7.x.y-SNAPSHOT
12/10/03 13:04:58 INFO server.AbstractConnector: Started SelectChann...@0.0.0.0:43457 STARTING
12/10/03 13:04:58 INFO spark.ShuffleManager: Local URI: http://127.0.0.1:43457
12/10/03 13:04:59 INFO server.Server: jetty-7.x.y-SNAPSHOT
12/10/03 13:04:59 INFO server.AbstractConnector: Started SelectChann...@0.0.0.0:59867 STARTING
12/10/03 13:04:59 INFO broadcast.HttpBroadcast: Broadcast server started at http://127.0.0.1:59867
Failed to load native Mesos library from /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
java.lang.UnsatisfiedLinkError: no mesos in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
    at java.lang.Runtime.loadLibrary0(Runtime.java:845)
    at java.lang.System.loadLibrary(System.java:1084)
    at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:46)
    at spark.SparkContext.<init>(SparkContext.scala:77)
    at SparkTest$.main(SparkTest.scala:11)
    at SparkTest.main(SparkTest.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
    at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
    at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)

    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
    at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
    at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
    at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
    at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
[root@localhost opt]#

The above error java.lang.UnsatisfiedLinkError: no mesos in java.library.path I am also getting while running the same code from Scala IDE. Do I have to add "libmesos.so" somewhere else than "spark-env.sh" as well?

Thanks,
Gaurav Dasgupta

Gaurav Dasgupta

unread,
Oct 3, 2012, 3:58:14 AM10/3/12
to spark...@googlegroups.com
ok. I copied "libmesos.so" in the desired locations and set the SPAK_HOME in /etc/profile and then the problem got resolved. I can execute the code and its getting registered in the Mesos Framework. But it is not progressing further after adding the job to Mesos Framework:

[root@localhost opt]# scala -classpath /usr/local/mesos/spark/core/target/spark-core-assembly-0.5.1-SNAPSHOT.jar SparkTest localhost:5050
12/10/03 13:19:39 INFO spark.BoundedMemoryCache: BoundedMemoryCache.maxBytes = 171284889
12/10/03 13:19:40 INFO spark.CacheTrackerActor: Registered actor on port 7077
12/10/03 13:19:40 INFO spark.CacheTrackerActor: Started slave cache (size 163.3MB) on localhost.localdomain
12/10/03 13:19:41 INFO spark.MapOutputTrackerActor: Registered actor on port 7077
12/10/03 13:19:41 INFO spark.ShuffleManager: Shuffle dir: /tmp/spark-local-4871e04b-fd0d-42f2-b4e4-24e7ae0d7984/shuffle
12/10/03 13:19:43 INFO server.Server: jetty-7.x.y-SNAPSHOT
12/10/03 13:19:44 INFO server.AbstractConnector: Started SelectChann...@0.0.0.0:41686 STARTING
12/10/03 13:19:44 INFO spark.ShuffleManager: Local URI: http://127.0.0.1:41686
12/10/03 13:19:45 INFO server.Server: jetty-7.x.y-SNAPSHOT
12/10/03 13:19:45 INFO server.AbstractConnector: Started SelectChann...@0.0.0.0:52155 STARTING
12/10/03 13:19:45 INFO broadcast.HttpBroadcast: Broadcast server started at http://127.0.0.1:52155
12/10/03 13:19:51 INFO spark.MesosScheduler: Registered as framework ID 201210031259-16777343-5050-2983-0000
12/10/03 13:19:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/10/03 13:19:55 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/03 13:19:55 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/03 13:19:57 INFO spark.PairRDDFunctions: Saving as hadoop file of type (NullWritable, Text)
12/10/03 13:19:57 INFO spark.SparkContext: Starting job...
12/10/03 13:19:57 INFO spark.CacheTracker: Registering RDD ID 5 with cache
12/10/03 13:19:57 INFO spark.CacheTrackerActor: Registering RDD 5 with 8 partitions
12/10/03 13:19:58 INFO spark.CacheTracker: Registering RDD ID 4 with cache
12/10/03 13:19:58 INFO spark.CacheTrackerActor: Registering RDD 4 with 8 partitions
12/10/03 13:19:58 INFO spark.CacheTracker: Registering RDD ID 3 with cache
12/10/03 13:19:58 INFO spark.CacheTrackerActor: Registering RDD 3 with 2 partitions
12/10/03 13:19:58 INFO spark.CacheTracker: Registering RDD ID 2 with cache
12/10/03 13:19:58 INFO spark.CacheTrackerActor: Registering RDD 2 with 2 partitions
12/10/03 13:19:58 INFO spark.CacheTracker: Registering RDD ID 1 with cache
12/10/03 13:19:58 INFO spark.CacheTrackerActor: Registering RDD 1 with 2 partitions
12/10/03 13:19:58 INFO spark.CacheTracker: Registering RDD ID 0 with cache
12/10/03 13:19:58 INFO spark.CacheTrackerActor: Registering RDD 0 with 2 partitions
12/10/03 13:19:58 INFO spark.CacheTrackerActor: Asked for current cache locations
12/10/03 13:19:58 INFO spark.MesosScheduler: Final stage: Stage 0
12/10/03 13:19:58 INFO spark.MesosScheduler: Parents of final stage: List(Stage 1)
12/10/03 13:19:58 INFO spark.MesosScheduler: Missing parents: List(Stage 1)
12/10/03 13:19:58 INFO spark.MesosScheduler: Submitting Stage 1, which has no missing parents
12/10/03 13:19:58 INFO spark.MesosScheduler: Got a job with 2 tasks
12/10/03 13:19:59 INFO spark.MesosScheduler: Adding job with ID 0


The job hangs at this stage. No error, no progress, nothing. The SPARK_MEM is set to 200m in my spark-env.sh and hence that cannot be an issue. I can run this same code now from Spark-shell by copy-pasting it there in the shell. But cannot from the terminal.
What can be the reason?

Thanks,
Gaurav Dasgupta

Gaurav Dasgupta

unread,
Oct 3, 2012, 6:14:44 AM10/3/12
to spark...@googlegroups.com
Hi Matei,

Reading from another blog where you have answered a similar issue, I restarted Mesos-slave with increased memory and now the job is no more hanging. But the problem does'nt end here. The tasks are failing because its unable to find SparkTest class files:

[root@localhost opt]# scala -classpath /usr/local/mesos/spark/core/target/spark-core-assembly-0.5.1-SNAPSHOT.jar SparkTest master@localhost:5050
12/10/03 15:35:08 INFO spark.BoundedMemoryCache: BoundedMemoryCache.maxBytes = 171284889
12/10/03 15:35:08 INFO spark.CacheTrackerActor: Registered actor on port 7077
12/10/03 15:35:08 INFO spark.CacheTrackerActor: Started slave cache (size 163.3MB) on localhost.localdomain
12/10/03 15:35:08 INFO spark.MapOutputTrackerActor: Registered actor on port 7077
12/10/03 15:35:08 INFO spark.ShuffleManager: Shuffle dir: /tmp/spark-local-6f21a675-141d-46af-8a40-c10c2e97d5c9/shuffle
12/10/03 15:35:08 INFO server.Server: jetty-7.x.y-SNAPSHOT
12/10/03 15:35:08 INFO server.AbstractConnector: Started SelectChann...@0.0.0.0:47114 STARTING
12/10/03 15:35:08 INFO spark.ShuffleManager: Local URI: http://127.0.0.1:47114
12/10/03 15:35:08 INFO server.Server: jetty-7.x.y-SNAPSHOT
12/10/03 15:35:08 INFO server.AbstractConnector: Started SelectChann...@0.0.0.0:33769 STARTING
12/10/03 15:35:08 INFO broadcast.HttpBroadcast: Broadcast server started at http://127.0.0.1:33769
12/10/03 15:35:08 INFO spark.MesosScheduler: Registered as framework ID 201210031522-16777343-5050-11350-0005
12/10/03 15:35:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/10/03 15:35:08 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/03 15:35:08 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/03 15:35:09 INFO spark.PairRDDFunctions: Saving as hadoop file of type (NullWritable, Text)
12/10/03 15:35:09 INFO spark.SparkContext: Starting job...
12/10/03 15:35:09 INFO spark.CacheTracker: Registering RDD ID 5 with cache
12/10/03 15:35:09 INFO spark.CacheTrackerActor: Registering RDD 5 with 8 partitions
12/10/03 15:35:09 INFO spark.CacheTracker: Registering RDD ID 4 with cache
12/10/03 15:35:09 INFO spark.CacheTrackerActor: Registering RDD 4 with 8 partitions
12/10/03 15:35:09 INFO spark.CacheTracker: Registering RDD ID 3 with cache
12/10/03 15:35:09 INFO spark.CacheTrackerActor: Registering RDD 3 with 2 partitions
12/10/03 15:35:09 INFO spark.CacheTracker: Registering RDD ID 2 with cache
12/10/03 15:35:09 INFO spark.CacheTrackerActor: Registering RDD 2 with 2 partitions
12/10/03 15:35:09 INFO spark.CacheTracker: Registering RDD ID 1 with cache
12/10/03 15:35:09 INFO spark.CacheTrackerActor: Registering RDD 1 with 2 partitions
12/10/03 15:35:09 INFO spark.CacheTracker: Registering RDD ID 0 with cache
12/10/03 15:35:09 INFO spark.CacheTrackerActor: Registering RDD 0 with 2 partitions
12/10/03 15:35:09 INFO spark.CacheTrackerActor: Asked for current cache locations
12/10/03 15:35:09 INFO spark.MesosScheduler: Final stage: Stage 0
12/10/03 15:35:09 INFO spark.MesosScheduler: Parents of final stage: List(Stage 1)
12/10/03 15:35:09 INFO spark.MesosScheduler: Missing parents: List(Stage 1)
12/10/03 15:35:09 INFO spark.MesosScheduler: Submitting Stage 1, which has no missing parents
12/10/03 15:35:09 INFO spark.MesosScheduler: Got a job with 2 tasks
12/10/03 15:35:09 INFO spark.MesosScheduler: Adding job with ID 0
12/10/03 15:35:09 INFO spark.SimpleJob: Starting task 0:0 as TID 0 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:09 INFO spark.SimpleJob: Size of task 0:0 is 10010 bytes and took 156 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:09 INFO spark.SimpleJob: Starting task 0:1 as TID 1 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:09 INFO spark.SimpleJob: Size of task 0:1 is 10010 bytes and took 12 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:10 INFO spark.CacheTrackerActor: Started slave cache (size 127.6MB) on localhost.localdomain
12/10/03 15:35:12 INFO spark.SimpleJob: Lost TID 0 (task 0:0)
12/10/03 15:35:12 INFO spark.SimpleJob: Loss was due to java.lang.ClassNotFoundException: SparkTest$$anonfun$2
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.scala$tools$nsc$util$ScalaClassLoader$$super$findClass(ScalaClassLoader.scala:88)
    at scala.tools.nsc.util.ScalaClassLoader$class.findClass(ScalaClassLoader.scala:44)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.findClass(ScalaClassLoader.scala:88)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.scala$tools$nsc$util$ScalaClassLoader$$super$loadClass(ScalaClassLoader.scala:88)
    at scala.tools.nsc.util.ScalaClassLoader$class.loadClass(ScalaClassLoader.scala:50)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.loadClass(ScalaClassLoader.scala:88)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at spark.JavaSerializerInstance$$anon$2.resolveClass(JavaSerializer.scala:41)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1593)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:435)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1866)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
    at spark.JavaSerializerInstance.deserialize(JavaSerializer.scala:43)
    at spark.Executor$TaskRunner.run(Executor.scala:78)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
12/10/03 15:35:12 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
12/10/03 15:35:12 INFO spark.SimpleJob: Loss was due to java.lang.ClassNotFoundException: SparkTest$$anonfun$2 [duplicate 1]
12/10/03 15:35:12 INFO spark.SimpleJob: Starting task 0:1 as TID 2 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:12 INFO spark.SimpleJob: Size of task 0:1 is 10010 bytes and took 4 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:12 INFO spark.SimpleJob: Starting task 0:0 as TID 3 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:12 INFO spark.SimpleJob: Size of task 0:0 is 10010 bytes and took 14 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:13 INFO spark.SimpleJob: Lost TID 3 (task 0:0)
12/10/03 15:35:13 INFO spark.SimpleJob: Lost TID 2 (task 0:1)
12/10/03 15:35:13 INFO spark.SimpleJob: Starting task 0:1 as TID 4 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:13 INFO spark.SimpleJob: Size of task 0:1 is 10010 bytes and took 34 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:13 INFO spark.SimpleJob: Starting task 0:0 as TID 5 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:13 INFO spark.SimpleJob: Size of task 0:0 is 10010 bytes and took 13 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:15 INFO spark.CacheTrackerActor: Started slave cache (size 127.6MB) on localhost.localdomain
12/10/03 15:35:16 INFO spark.SimpleJob: Lost TID 5 (task 0:0)
12/10/03 15:35:16 INFO spark.SimpleJob: Loss was due to java.lang.ClassNotFoundException: SparkTest$$anonfun$2 [duplicate 2]
12/10/03 15:35:16 INFO spark.SimpleJob: Lost TID 4 (task 0:1)
12/10/03 15:35:16 INFO spark.SimpleJob: Loss was due to java.lang.ClassNotFoundException: SparkTest$$anonfun$2 [duplicate 3]
12/10/03 15:35:16 INFO spark.SimpleJob: Starting task 0:1 as TID 6 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:16 INFO spark.SimpleJob: Size of task 0:1 is 10010 bytes and took 28 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:16 INFO spark.SimpleJob: Starting task 0:0 as TID 7 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:16 INFO spark.SimpleJob: Size of task 0:0 is 10010 bytes and took 5 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:18 INFO spark.SimpleJob: Lost TID 7 (task 0:0)
12/10/03 15:35:18 INFO spark.SimpleJob: Lost TID 6 (task 0:1)
12/10/03 15:35:18 INFO spark.SimpleJob: Starting task 0:1 as TID 8 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:18 INFO spark.SimpleJob: Size of task 0:1 is 10010 bytes and took 15 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:18 INFO spark.SimpleJob: Starting task 0:0 as TID 9 on slave 201210031522-16777343-5050-11350-3: localhost.localdomain (preferred)
12/10/03 15:35:18 INFO spark.SimpleJob: Size of task 0:0 is 10010 bytes and took 18 ms to serialize by spark.JavaSerializerInstance
12/10/03 15:35:20 INFO spark.CacheTrackerActor: Started slave cache (size 127.6MB) on localhost.localdomain
12/10/03 15:35:21 INFO spark.SimpleJob: Lost TID 8 (task 0:1)
12/10/03 15:35:21 INFO spark.SimpleJob: Loss was due to java.lang.ClassNotFoundException: SparkTest$$anonfun$2 [duplicate 4]
12/10/03 15:35:21 ERROR spark.SimpleJob: Task 0:1 failed more than 4 times; aborting job
spark.SparkException: Task failed: ShuffleMapTask(1, 1), reason: ExceptionFailure(java.lang.ClassNotFoundException: SparkTest$$anonfun$2)
    at spark.DAGScheduler$class.runJob(DAGScheduler.scala:313)
    at spark.MesosScheduler.runJob(MesosScheduler.scala:26)
    at spark.SparkContext.runJob(SparkContext.scala:316)
    at spark.SparkContext.runJob(SparkContext.scala:334)
    at spark.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:376)
    at spark.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:339)
    at spark.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:268)
    at spark.RDD.saveAsTextFile(RDD.scala:256)
    at SparkTest$.main(SparkTest.scala:19)

    at SparkTest.main(SparkTest.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
    at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
    at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
    at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
    at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
    at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
    at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
12/10/03 15:35:21 INFO spark.MesosScheduler: Ignoring update from TID 9 because its job is gone
[root@localhost opt]#


But all the classes mentioned in the above error are there in the working directory (/opt) from where I am trying to run the source code. Where exactly, it is trying to locate the classes. What should I do now?

Thanks,
Gaurav Dasgupta

Matei Zaharia

unread,
Oct 3, 2012, 11:19:25 AM10/3/12
to spark...@googlegroups.com
Hi Gaurav,

You need to tell Spark where the classes are so that it can find them on the slave as well. The best way to do this is to package them into a JAR and pass that as an argument to SparkContext, as described in https://github.com/mesos/spark/wiki/Spark-Programming-Guide under "initializing Spark". (This is also where you pass the location of Spark on the cluster.) Another, simpler but less flexible way, is to edit your conf/spark-env.sh and add export SPARK_CLASSPATH=/opt (or whatever path you want). Then you'd need to manually copy the code to that path on each machine.

Matei

Gaurav Dasgupta

unread,
Oct 4, 2012, 3:56:56 AM10/4/12
to spark...@googlegroups.com
Thanks Matei. It worked finally!
Reply all
Reply to author
Forward
0 new messages