i have a mytest.jar with only one class it in called mytest.Test which is simply defined as:
case class Test(x : String)
i run spark-shell with SPARK_CLASSPATH including mytest.jar
inside the shell i do:
scala> import mytest._
import mytest._
scala> sc.addJar("mytest.jar")
13/06/20 13:06:00 INFO spark.SparkContext: Added JAR mytest.jar at
http://192.168.3.175:37542/myjars/mytest.jar with timestamp 1371747960512
scala> val r = sc.parallelize(List(Test("x1"), Test("x2"), Test("x3"), Test("x4")))
r: spark.RDD[mytest.Test] = ParallelCollectionRDD[4] at parallelize at <console>:18
scala> r.filter{ case Test(str) => str == "x1" }.collect
which gives me errors like this:
13/06/20 13:08:38 INFO cluster.TaskSetManager: Loss was due to java.lang.NoClassDefFoundError: mytest/Test [duplicate 1]
it looks like the slaves don't know about my new class. is there a way to make this work? would be nice to be able to do interactive analysis like this.
i considered putting my jar on every slave's classpath and restarting the slaves but i dont consider that an acceptable/scalable (in my time) solution.