'No FileSystem for scheme: hdfs' exception when running spark on yarn

7,461 views
Skip to first unread message

Yang Fang

unread,
Nov 7, 2012, 3:05:28 AM11/7/12
to spark...@googlegroups.com
hi,all,

 I have a 13 nodes cdh4.1.1 cluster, and I want to  run spark on yarn. Everything is ok in the beginning except that I have to add some hadoop conf files to 'core/src/main/resources/', SparkPi can run , it's so excited. But when I run HdfsTest, the logs for container tips me that, as below,

Exception in thread "Thread-2" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:110)
Caused by: java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2130)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2137)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2176)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2158)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:251)
at spark.rdd.HadoopRDD.(HadoopRDD.scala:57)
at spark.SparkContext.hadoopFile(SparkContext.scala:244)
at spark.SparkContext.textFile(SparkContext.scala:213)
at spark.examples.HdfsTest$.main(HdfsTest.scala:8)
at spark.examples.HdfsTest.main(HdfsTest.scala)
... 5 more

Could you do me a favor, plz?

Yours,
Yang

Matei Zaharia

unread,
Nov 7, 2012, 11:35:20 AM11/7/12
to spark...@googlegroups.com
Did you compile Spark against the CDH4.1.1 version of Hadoop? You can change the Hadoop version in project/SparkBuild.scala, as documented at http://www.spark-project.org/docs/0.6.0/.

Matei

Yang Fang

unread,
Nov 7, 2012, 9:51:06 PM11/7/12
to spark...@googlegroups.com
yes, I compile spark against cdh4.1.1 version of Hadoop. I use branch yarn,  and it's already  up to date.

Yang Fang

unread,
Nov 8, 2012, 2:21:40 AM11/8/12
to spark...@googlegroups.com
It's fixed, just add 'fs.hdfs.impl' property to conf is ok. The reason is that cdh4.1.1 remove  'fs.hdfs.impl' property from core-default.xml .
But there is still something wrong with HdfsTest,I start it like this:

SPARK_JAR=./core/target/spark-core-assembly-0.6.0.jar ./run spark.deploy.yarn.Client --jar examples/target/scala-2.9.2/spark-examples_2.9.2-0.6.0.jar --class spark.examples.HdfsTest --args standalone --num-workers 8 --worker-memory 500m --worker-cores 2

log for container has lots of exception info:

12/11/08 15:01:59 ERROR BlockManagerMasterActor: key not found: BlockManagerId(mis21238.hadoop.data.sina.com.cn, 58334)
java.util.NoSuchElementException: key not found: BlockManagerId(mis21238.hadoop.data.sina.com.cn, 58334)
	at scala.collection.MapLike$class.default(MapLike.scala:225)
	at scala.collection.mutable.HashMap.default(HashMap.scala:45)
	at scala.collection.MapLike$class.apply(MapLike.scala:135)
	at scala.collection.mutable.HashMap.apply(HashMap.scala:45)
	at spark.storage.BlockManagerMasterActor.spark$storage$BlockManagerMasterActor$$heartBeat(BlockManagerMaster.scala:244)
	at spark.storage.BlockManagerMasterActor$$anonfun$receive$1.apply(BlockManagerMaster.scala:189)
	at spark.storage.BlockManagerMasterActor$$anonfun$receive$1.apply(BlockManagerMaster.scala:184)
	at akka.actor.Actor$class.apply(Actor.scala:318)
	at spark.storage.BlockManagerMasterActor.apply(BlockManagerMaster.scala:91)
	at akka.actor.ActorCell.invoke(ActorCell.scala:626)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
	at akka.dispatch.Mailbox.run(Mailbox.scala:179)
	at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
	at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
	at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
	at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
	at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Matei Zaharia

unread,
Nov 8, 2012, 2:39:19 AM11/8/12
to spark...@googlegroups.com
Is mis21238.hadoop.data.sina.com.cn your master by any chance? We had a bug that would cause this problem that we recently fixed in branch-0.6. If it's *not* your master, then the problem is probably that it's not reporting its DNS name correctly, or that its DNS name does not resolve to the right IP on every machine. Double-check that.

Matei

Matei Zaharia

unread,
Nov 8, 2012, 2:41:31 AM11/8/12
to spark...@googlegroups.com
By the way, thanks for reporting the fs.hdfs.impl thing; we should document that.

Matei

Yang Fang

unread,
Nov 8, 2012, 3:06:29 AM11/8/12
to spark...@googlegroups.com
No, it's not master. I run HdfsTest several times. It can successfully finish sometime , sometime It fails. If it fails, log of ApplicationMaster  has log of "BlockManagerMasterActor: key not found: BlockManagerId java.util.NoSuchElementException: key not found: BlockManagerId" exception.

@matei, could you give me a link for bug you just noted ?

Matei Zaharia

unread,
Nov 8, 2012, 3:08:50 AM11/8/12
to spark...@googlegroups.com
This is what I fixed: https://github.com/mesos/spark/commit/e782187b4af3b2ffe83e67fee7c783b5dfcd09e5 but it only affected jobs that try to do a take() or collect() on a cached RDD from the master process. It would also have happened deterministically. So my guess is that in your case, DNS is misconfigured for some of the machines.

Matei

swarnim kulkarni

unread,
Feb 14, 2013, 10:08:42 PM2/14/13
to spark...@googlegroups.com
I just ran into this problem as well while compiling against CDH4.1.1. You don't need to set that property if you have appropriate hdfs jars on the hadoop classpath. To resolve that error, if you are using maven, set the following dependency in your pom:

<dependency>

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-hdfs</artifactId>

<version>2.0.0-cdh4.1.1</version>

  </dependency>

You should be then good to go.

Laxman Vemula

unread,
Apr 2, 2013, 3:43:21 PM4/2/13
to spark...@googlegroups.com
Is there any way that we can add above dependency while building using sbt?

Thanks,
Laxman

Patrick Wendell

unread,
Apr 2, 2013, 4:09:33 PM4/2/13
to spark...@googlegroups.com
Open project/SparkBuild.scala

after this line:
"org.apache.hadoop" % "hadoop-core" % HADOOP_VERSION,

insert this line
"org.apache.hadoop" % "hadoop-hdfs" % "2.0.0-cdh4.1.1",

then do sbt/sbt clean and rebuild
> --
> You received this message because you are subscribed to the Google Groups
> "Spark Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spark-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
Message has been deleted

Vipul Pandey

unread,
Aug 16, 2013, 6:29:49 PM8/16/13
to spark...@googlegroups.com
I was getting the following error myself after changing  project/SparkBuild.scala  file  to set : 
 // For Hadoop 2 versions such as "2.0.0-mr1-cdh4.1.1", set the HADOOP_MAJOR_VERSION to "2"
  val HADOOP_VERSION = "2.0.0-mr1-cdh4.1.1"
  val HADOOP_MAJOR_VERSION = "2"



Caused by: java.io.IOException: No FileSystem for scheme: hdfs 

Inserted the line you suggested at the right place : 
      "org.apache.hadoop" % "hadoop-hdfs" % "2.0.0-cdh4.1.1", 

But now it's giving me something different while compiling itself 
I run sbt/sbt clean followed by assembly and this is what I get : 

[error]  public: bad organisation found in http://repo1.maven.org/maven2/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom: expected='commons-daemon' found='org.apache.commons'
[info] Resolving cglib#cglib-nodep;2.2.2 ...
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: commons-daemon#commons-daemon;1.0.3: java.text.ParseException: inconsistent module descriptor file found in 'http://repo1.maven.org/maven2/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom': bad organisation: expected='commons-daemon' found='org.apache.commons'; 
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: commons-daemon#commons-daemon;1.0.3: java.text.ParseException: inconsistent module descriptor file found in 'http://repo1.maven.org/maven2/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom': bad organisation: expected='commons-daemon' found='org.apache.commons'; 
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:214)
at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:122)
at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:121)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:117)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:117)


Any clue what's going on?

Thanks,
Vipul
Message has been deleted

Vipul

unread,
Aug 16, 2013, 7:49:56 PM8/16/13
to spark...@googlegroups.com
I got around that issue by simply downloading the jar file manually and placing it in local repository - So that's not an issue anymore - but even after all that circus I'm back at my original error : 

13/08/16 16:45:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/08/16 16:45:51 WARN snappy.LoadSnappy: Snappy native library not loaded
[error] (run-main) java.io.IOException: No FileSystem for scheme: hdfs
java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2206)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2213)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2252)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2234)


usning sbt assembly after adding this line in the code : 
     "org.apache.hadoop" % "hadoop-hdfs" % "2.0.0-cdh4.1.1", 

Vipul

unread,
Aug 20, 2013, 5:32:18 PM8/20/13
to spark...@googlegroups.com
any solutions anyone?

Arun Ahuja

unread,
Dec 9, 2013, 1:31:33 PM12/9/13
to spark...@googlegroups.com
Wanted to see if anyone had found a solution for this - I have tried both suggested fixes 

1) In Maven add hadoop-client-2.2 and even added hadoop-hdfs-2.2
2) Added property 

<property>
  <name>fs.hdfs.impl</name>
  <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
  <description>The FileSystem for hdfs: uris.</description>
</property>

to core-site.xml.

neither seem to resolve the error:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:164)
        at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:558)
Reply all
Reply to author
Forward
0 new messages