Error when running Spark locally.

Tuantuan

unread,

Jun 27, 2013, 12:56:05 PM6/27/13

to spark...@googlegroups.com

Hi,
I try to initialize an RDD and run Spark locally through
        JavaSparkContext sc = new JavaSparkContext("local", "Simple Job", "C:/Users/tuantuan/Desktop/spark-0.7.2",
                "C:/Users/tuantuan/Desktop/spark-0.7.2/core/target/spark-core-assembly-0.7.2");
        JavaRDD<String> testData= sc.textFile("data.txt");
        System.out.println(testData.count());

but I get following errors:
13/06/27 12:52:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/06/27 12:52:16 WARN snappy.LoadSnappy: Snappy native library not loaded
...
13/06/27 12:52:16 ERROR local.LocalScheduler: Exception in task 0
java.io.IOException: No FileSystem for scheme: C
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1383)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at spark.Utils$.fetchFile(Utils.scala:172)
    at spark.scheduler.local.LocalScheduler$$anonfun$updateDependencies$6.apply(LocalScheduler.scala:131)
    at spark.scheduler.local.LocalScheduler$$anonfun$updateDependencies$6.apply(LocalScheduler.scala:129)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
    at scala.collection.Iterator$class.foreach(Iterator.scala:772)
    at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
    at spark.scheduler.local.LocalScheduler.updateDependencies(LocalScheduler.scala:129)
    at spark.scheduler.local.LocalScheduler.runTask$1(LocalScheduler.scala:69)
    at spark.scheduler.local.LocalScheduler$$anon$1.run(LocalScheduler.scala:49)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
13/06/27 12:52:16 INFO scheduler.DAGScheduler: Failed to run count at example1.java:24
Exception in thread "main" spark.SparkException: Job failed: ResultTask(0, 0) failed: ExceptionFailure(java.io.IOException: No FileSystem for scheme: C)
    at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)
    at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)
    at spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:601)
    at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:300)
    at spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)
    at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)

Does anyone know how to solve it?
Thanks.

Aslan Bekirov

unread,

Jun 28, 2013, 4:01:07 AM6/28/13

to spark...@googlegroups.com

Where is data.txt located?

If it is in HDFS can you give full path of data.txt like hdfs://masterIP:port/user/data.txt

JavaRDD<String> testData= sc.textFile("hdfs://masterIP:port/"data.txt directory"");

And try again.

BR,
Aslan

Tuantuan

unread,

Jun 28, 2013, 11:23:36 AM6/28/13

to spark...@googlegroups.com

Hi，
It is not in HDFS. It is just a file in my laptop. Changing the path to the full path of the file still results in the error.
By the way, do I need to have Hadoop installed in order to run the task locally?

Josh Rosen

unread,

Jun 28, 2013, 12:05:17 PM6/28/13

to spark...@googlegroups.com

It looks like you're running Spark under Windows. It seems like the Hadoop filesystem layer is interpreting "C:/" as a filesystem like "file://" or "hdfs://" instead of treating it as the start of an absolute path to a local file.

I don't know the solution to this problem off the top of my head, but I don't think the problem is Spark-specific. I'd try searching for examples of reading local files using Hadoop on Windows. Maybe a regular Windows user could chime in with a solution?

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tuantuan

unread,

Jun 28, 2013, 12:29:57 PM6/28/13

to spark...@googlegroups.com

Thanks.
The problem is solved by just using
JavaSparkContext sc = new JavaSparkContext("local", "Simple Job");

Reply all

Reply to author

Forward