HDFS FileSystem error while running Spark job

83 views
Skip to first unread message

Gaurav Dasgupta

unread,
Apr 2, 2013, 2:18:19 AM4/2/13
to spark...@googlegroups.com
Hi,

I have installed Spark 0.7.0 in my cluster. I had edited the SparkBuild.scala file to have:

val HADOOP_VERSION = "2.0.0-mr1-cdh4.2.0"
val HADOOP_MAJOR_VERSION = "2"

I have CDH4 MR1 installed in my cluster. Hadoop version gives me: Hadoop 2.0.0-cdh4.2.0.

Now, when I am trying to run a Spark job which takes input from HDFS and store output there, I am getting the following error:

java.io.IOException: No FileSystem for scheme: hdfs
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2250)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2257)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:86)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2296)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2278)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:316)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:173)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:205)
    at spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:52)
    at spark.RDD.partitions(RDD.scala:168)
    at spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:9)
    at spark.RDD.partitions(RDD.scala:168)
    at spark.SparkContext.runJob(SparkContext.scala:624)
    at spark.RDD.count(RDD.scala:490)
    at Kruskals$.main(Kruskals.scala:198)
    at Kruskals.main(Kruskals.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
    at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
    at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
    at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
    at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
    at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
    at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)


Can someone tell me where I am going wrong?

Thanks,
Gaurav

Gaurav Dasgupta

unread,
Apr 2, 2013, 3:26:16 AM4/2/13
to spark...@googlegroups.com
Hi All,

This is solved after putting hadoop-hdfs.jar in the classpath of the application.

Thanks,
Gaurav
Reply all
Reply to author
Forward
0 new messages