Exception in thread "main" java.lang.NoClassDefFoundError: scala/reflect/ClassManifest

1,964 views
Skip to first unread message

ej.fe...@campus.fct.unl.pt

unread,
Jun 12, 2013, 9:40:02 AM6/12/13
to scala...@googlegroups.com
I'm trying spark but i cant run my code.
I use IntelliJ with Maven and Java project.
I put that dependency:
<dependency>
            <groupId>org.spark-project</groupId>
            <artifactId>spark-core_2.9.3</artifactId>
            <version>0.7.2</version>
</dependency>

But when I do mvn package and try to run like:

java -jar JAR_NAME.jar PARAMS
I receive the following error: 
Error: Invalid or corrupt jarfile target/JAR_NAME.jar

But if I put:
<dependency>
            <groupId>org.spark-project</groupId>
            <artifactId>spark-core_2.9.3</artifactId>
            <version>0.7.1</version>
</dependency>

The error its other:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/reflect/ClassManifest

In the line 

JavaSparkContext jsc = new JavaSparkContext(master_host, app_name,
                SPARK_HOME, new String[]{JAR_NAME.jar});

Thanks

Simon Ochsenreither

unread,
Jun 12, 2013, 9:51:52 AM6/12/13
to scala...@googlegroups.com
Did you mention the appropriate scala-library.jar in your classpath?

ej.fe...@campus.fct.unl.pt

unread,
Jun 12, 2013, 10:06:07 AM6/12/13
to scala...@googlegroups.com
Ok, I try that: 

 <dependency>
            <groupId>org.spark-project</groupId>
            <artifactId>spark-core_2.9.3</artifactId>
            <version>0.7.1</version>
        </dependency>

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.9.3</version>
        </dependency>

And I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

And I put:

<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>0.20.2</version>
        </dependency>

And I get:
15:01:48.743 [main] INFO  s.SparkEnv - Registering BlockManagerMaster
15:01:48.772 [main] INFO  s.s.MemoryStore - MemoryStore started with capacity 1161.6 MB.
....
...
15:01:49.782 [main] DEBUG o.a.h.c.Configuration - java.io.IOException: config()
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:211)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:198)
at spark.SparkContext.<init>(SparkContext.scala:191)
at spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:38)
        at ....main.Main.main(Main.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
15:01:49.802 [DAGScheduler] TRACE s.s.DAGScheduler - Checking for newly runnable parent stages
15:01:49.802 [DAGScheduler] TRACE s.s.DAGScheduler - running: Set()
15:01:49.802 [DAGScheduler] TRACE s.s.DAGScheduler - waiting: Set()
15:01:49.802 [DAGScheduler] TRACE s.s.DAGScheduler - failed: Set()
..
Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class
My code:

JavaSparkContext jsc = new JavaSparkContext(master, ""SPARK by JAVA",
                SPARK_HOME, JAR_NAME);

JavaRDD<Task> tasks = jsc.parallelize(taskList, taskList.size());
            VoidFunction<Task> function = new VoidFunction<Task>() {
                @Override
                public void call(Task task) throws Exception {
                    task.execute();
                }
            };
            tasks.foreach(function);

And the explanation of error continuos like:
at spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:191)

So its in the line:
tasks.foreach(function);

So I think something in dependencies is wrong!

ScottC

unread,
Jun 12, 2013, 1:49:03 PM6/12/13
to scala...@googlegroups.com
Hadoop's dependencies are a complete disaster.  Have a look at what mvn dependency:tree (or the equivalent if sbt) looks like.  You may need to add exclusions there to prevent duplicate or conflicting classes where different artifacts are providing overlapping classes.  

You may need a newer hadoop dependency that spark compiled against (likely from the 1.0.x or 2.0.x line rather than 0.20.x).

ej.fe...@campus.fct.unl.pt

unread,
Jun 14, 2013, 5:42:30 AM6/14/13
to scala...@googlegroups.com
I try other thing: 
Remove scala dependency and changed haddop dependency:
<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.0</version>
        </dependency>

And i try to run the program like this (in a bash file):

SCALA_LIBS=$SCALA_HOME/lib/scala-library.jar:$SCALA_HOME/lib/jline.jar:$SCALA_HOME/lib/scala-compiler.jar
JARS=$SPARK_HOME/lib_managed/jars/*
java -cp $SCALA_LIBS:$JARS:target/PROGRAM_JAR.jar Main ...

I my code are like that:

public class FrameworkTestSpark extends AbstractBackend {
    public FrameworkTestSpark(TaskManager taskManager, String master, int nrNodes) {
        super(taskManager);
        this.nrNodes = nrNodes;
        executeTasks(master);
    }

    private void executeTasks(String master) {
        JavaSparkContext jsc = new JavaSparkContext(master, "Parameter Sweep by Spark",
                SPARK_HOME, BackendUtilities.getJarPath());

        List<Task> taskList;
        while ((taskList = nextNTasks(nrNodes)) != null && !taskList.isEmpty()) {
            JavaRDD<Task> tasks = jsc.parallelize(taskList, taskList.size());
            tasks.foreach(new TaskFunction());
        }
    }

    class TaskFunction extends VoidFunction<Task> {
        @Override
        public void call(Task task) throws Exception {

        }
    }
}

And i get this:

13/06/14 10:34:45 WARN spark.Utils: Your hostname, eduZai resolves to a loopback address: 127.0.1.1; using 172.20.62.231 instead (on interface eth0)
13/06/14 10:34:45 WARN spark.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
13/06/14 10:34:46 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/06/14 10:34:46 INFO spark.SparkEnv: Registering BlockManagerMaster
13/06/14 10:34:46 INFO storage.MemoryStore: MemoryStore started with capacity 1161.6 MB.
13/06/14 10:34:46 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20130614103446-fc9e
13/06/14 10:34:46 INFO network.ConnectionManager: Bound socket to port 39216 with id = ConnectionManagerId(eduZai,39216)
13/06/14 10:34:46 INFO storage.BlockManagerMaster: Trying to register BlockManager
13/06/14 10:34:46 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager eduZai:39216 with 1161.6 MB RAM
13/06/14 10:34:46 INFO storage.BlockManagerMaster: Registered BlockManager
13/06/14 10:34:46 INFO server.Server: jetty-7.6.8.v20121106
13/06/14 10:34:46 INFO server.AbstractConnector: Started SocketC...@0.0.0.0:50945
13/06/14 10:34:46 INFO broadcast.HttpBroadcast: Broadcast server started at http://172.20.62.231:50945
13/06/14 10:34:46 INFO spark.SparkEnv: Registering MapOutputTracker
13/06/14 10:34:46 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-83c3f75b-c12d-44e9-aeae-86ac37c95996
13/06/14 10:34:46 INFO server.Server: jetty-7.6.8.v20121106
13/06/14 10:34:46 INFO server.AbstractConnector: Started SocketC...@0.0.0.0:36774
13/06/14 10:34:46 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started
13/06/14 10:34:47 INFO server.HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:51050
13/06/14 10:34:47 INFO storage.BlockManagerUI: Started BlockManager web UI at http://eduZai:51050
13/06/14 10:34:47 INFO spark.SparkContext: Added JAR /target/PROGRAM_JAR.jar at http://172.20.62.231:36774/jars/PROGRAM_JAR.jar with timestamp 1371202487385
13/06/14 10:34:47 INFO spark.SparkContext: Starting job: foreach at FrameworkTestSpark.java:45
13/06/14 10:34:47 INFO scheduler.DAGScheduler: Got job 0 (foreach at FrameworkTestSpark.java:45) with 4 output partitions (allowLocal=false)
13/06/14 10:34:47 INFO scheduler.DAGScheduler: Final stage: Stage 0 (parallelize at FrameworkSchedulerForSpark.java:44)
13/06/14 10:34:47 INFO scheduler.DAGScheduler: Parents of final stage: List()
13/06/14 10:34:47 INFO scheduler.DAGScheduler: Missing parents: List()
13/06/14 10:34:47 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at FrameworkTestSpark.java:44), which has no missing parents
13/06/14 10:34:47 INFO scheduler.DAGScheduler: Submitting 4 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at FrameworkTestSpark.java:44)
13/06/14 10:34:47 INFO local.LocalScheduler: Running ResultTask(0, 2)
13/06/14 10:34:47 INFO local.LocalScheduler: Running ResultTask(0, 0)
13/06/14 10:34:47 INFO local.LocalScheduler: Running ResultTask(0, 1)
13/06/14 10:34:47 INFO local.LocalScheduler: Running ResultTask(0, 3)
13/06/14 10:34:47 ERROR local.LocalScheduler: Exception in task 0
java.io.NotSerializableException: FrameworkTestSpark
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
at spark.JavaSerializationStream.writeObject(JavaSerializer.scala:11)
at spark.scheduler.ResultTask$.serializeInfo(ResultTask.scala:27)
at spark.scheduler.ResultTask.writeExternal(ResultTask.scala:91)
at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1443)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1414)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
at spark.JavaSerializationStream.writeObject(JavaSerializer.scala:11)
at spark.JavaSerializerInstance.serialize(JavaSerializer.scala:31)
at spark.scheduler.Task$.serializeWithDependencies(Task.scala:61)
at spark.scheduler.local.LocalScheduler.runTask$1(LocalScheduler.scala:66)
at spark.scheduler.local.LocalScheduler$$anon$1.run(LocalScheduler.scala:49)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Any sugestion?

ej.fe...@campus.fct.unl.pt

unread,
Jun 14, 2013, 7:14:00 AM6/14/13
to scala...@googlegroups.com
I can solve that with the following strategy:

public class FrameworkTestForSpark extends AbstractBackend {

    public FrameworkSchedulerForSpark(TaskManager taskManager, String master, int nrNodes) {
        super(taskManager);
        this.nrNodes = nrNodes;
        executeTasks(master);
    }
    private void executeTasks(String master) {
        JavaSparkContext jsc = new JavaSparkContext(master, "Parameter Sweep by Spark",
                SPARK_HOME, BackendUtilities.getJarPath());

        List<Task> taskList;
        while ((taskList = nextNTasks(nrNodes)) != null && !taskList.isEmpty()) {
            JavaRDD<Task> tasks = jsc.parallelize(taskList, taskList.size());
            new ExecuteTasks(tasks);
        }
    }
}

class ExecuteTasks implements Serializable {
    ExecuteTasks(JavaRDD<Task> tasks) {
        tasks.foreach(new TaskFunction());
    }

    class TaskFunction extends VoidFunction<Task> {
        @Override
        public void call(Task task) throws Exception {
        }
    }
}

Quarta-feira, 12 de Junho de 2013 14:40:02 UTC+1, ej.fe...@campus.fct.unl.pt escreveu:
Reply all
Reply to author
Forward
0 new messages