'No FileSystem for scheme: hdfs' exception when running spark on yarn

Yang Fang

unread,

Nov 7, 2012, 3:05:28 AM11/7/12

to spark...@googlegroups.com

hi,all,

I have a 13 nodes cdh4.1.1 cluster, and I want to run spark on yarn. Everything is ok in the beginning except that I have to add some hadoop conf files to 'core/src/main/resources/', SparkPi can run , it's so excited. But when I run HdfsTest, the logs for container tips me that, as below,

Exception in thread "Thread-2" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:110)
Caused by: java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2130)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2137)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2176)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2158)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:251)
at spark.rdd.HadoopRDD.(HadoopRDD.scala:57)
at spark.SparkContext.hadoopFile(SparkContext.scala:244)
at spark.SparkContext.textFile(SparkContext.scala:213)
at spark.examples.HdfsTest$.main(HdfsTest.scala:8)
at spark.examples.HdfsTest.main(HdfsTest.scala)
... 5 more

Could you do me a favor, plz?

Yours,
Yang

Matei Zaharia

unread,

Nov 7, 2012, 11:35:20 AM11/7/12

to spark...@googlegroups.com

Did you compile Spark against the CDH4.1.1 version of Hadoop? You can change the Hadoop version in project/SparkBuild.scala, as documented at http://www.spark-project.org/docs/0.6.0/.

Matei

Yang Fang

unread,

Nov 7, 2012, 9:51:06 PM11/7/12

to spark...@googlegroups.com

yes, I compile spark against cdh4.1.1 version of Hadoop. I use branch yarn, and it's already up to date.

Yang Fang

unread,

Nov 8, 2012, 2:21:40 AM11/8/12

to spark...@googlegroups.com

It's fixed, just add 'fs.hdfs.impl' property to conf is ok. The reason is that cdh4.1.1 remove 'fs.hdfs.impl' property from core-default.xml .
But there is still something wrong with HdfsTest，I start it like this:

SPARK_JAR=./core/target/spark-core-assembly-0.6.0.jar ./run spark.deploy.yarn.Client --jar examples/target/scala-2.9.2/spark-examples_2.9.2-0.6.0.jar --class spark.examples.HdfsTest --args standalone --num-workers 8 --worker-memory 500m --worker-cores 2

log for container has lots of exception info:

12/11/08 15:01:59 ERROR BlockManagerMasterActor: key not found: BlockManagerId(mis21238.hadoop.data.sina.com.cn, 58334)
java.util.NoSuchElementException: key not found: BlockManagerId(mis21238.hadoop.data.sina.com.cn, 58334)
	at scala.collection.MapLike$class.default(MapLike.scala:225)
	at scala.collection.mutable.HashMap.default(HashMap.scala:45)
	at scala.collection.MapLike$class.apply(MapLike.scala:135)
	at scala.collection.mutable.HashMap.apply(HashMap.scala:45)
	at spark.storage.BlockManagerMasterActor.spark$storage$BlockManagerMasterActor$$heartBeat(BlockManagerMaster.scala:244)
	at spark.storage.BlockManagerMasterActor$$anonfun$receive$1.apply(BlockManagerMaster.scala:189)
	at spark.storage.BlockManagerMasterActor$$anonfun$receive$1.apply(BlockManagerMaster.scala:184)
	at akka.actor.Actor$class.apply(Actor.scala:318)
	at spark.storage.BlockManagerMasterActor.apply(BlockManagerMaster.scala:91)
	at akka.actor.ActorCell.invoke(ActorCell.scala:626)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
	at akka.dispatch.Mailbox.run(Mailbox.scala:179)
	at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
	at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
	at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
	at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
	at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Matei Zaharia

unread,

Nov 8, 2012, 2:39:19 AM11/8/12

to spark...@googlegroups.com

Is mis21238.hadoop.data.sina.com.cn your master by any chance? We had a bug that would cause this problem that we recently fixed in branch-0.6. If it's *not* your master, then the problem is probably that it's not reporting its DNS name correctly, or that its DNS name does not resolve to the right IP on every machine. Double-check that.

Matei

Matei Zaharia

unread,

Nov 8, 2012, 2:41:31 AM11/8/12

to spark...@googlegroups.com

By the way, thanks for reporting the fs.hdfs.impl thing; we should document that.

Matei

Yang Fang

unread,

Nov 8, 2012, 3:06:29 AM11/8/12

to spark...@googlegroups.com

No, it's not master. I run HdfsTest several times. It can successfully finish sometime , sometime It fails. If it fails, log of ApplicationMaster has log of "BlockManagerMasterActor: key not found: BlockManagerId java.util.NoSuchElementException: key not found: BlockManagerId" exception.

@matei, could you give me a link for bug you just noted ?

Matei Zaharia

unread,

Nov 8, 2012, 3:08:50 AM11/8/12

to spark...@googlegroups.com

This is what I fixed: https://github.com/mesos/spark/commit/e782187b4af3b2ffe83e67fee7c783b5dfcd09e5 but it only affected jobs that try to do a take() or collect() on a cached RDD from the master process. It would also have happened deterministically. So my guess is that in your case, DNS is misconfigured for some of the machines.

Matei

swarnim kulkarni

unread,

Feb 14, 2013, 10:08:42 PM2/14/13

to spark...@googlegroups.com

I just ran into this problem as well while compiling against CDH4.1.1. You don't need to set that property if you have appropriate hdfs jars on the hadoop classpath. To resolve that error, if you are using maven, set the following dependency in your pom:

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-hdfs</artifactId>

</dependency>

You should be then good to go.

Laxman Vemula

unread,

Apr 2, 2013, 3:43:21 PM4/2/13

to spark...@googlegroups.com

Is there any way that we can add above dependency while building using sbt?

Thanks,

Laxman

Patrick Wendell

unread,

Apr 2, 2013, 4:09:33 PM4/2/13

to spark...@googlegroups.com

Open project/SparkBuild.scala

after this line:
"org.apache.hadoop" % "hadoop-core" % HADOOP_VERSION,

insert this line
"org.apache.hadoop" % "hadoop-hdfs" % "2.0.0-cdh4.1.1",

then do sbt/sbt clean and rebuild

> --
> You received this message because you are subscribed to the Google Groups
> "Spark Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spark-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Message has been deleted

Vipul Pandey

unread,

Aug 16, 2013, 6:29:49 PM8/16/13

to spark...@googlegroups.com

I was getting the following error myself after changing project/SparkBuild.scala file to set :

// For Hadoop 2 versions such as "2.0.0-mr1-cdh4.1.1", set the HADOOP_MAJOR_VERSION to "2"

val HADOOP_VERSION = "2.0.0-mr1-cdh4.1.1"

val HADOOP_MAJOR_VERSION = "2"

Caused by: java.io.IOException: No FileSystem for scheme: hdfs

Inserted the line you suggested at the right place :

"org.apache.hadoop" % "hadoop-hdfs" % "2.0.0-cdh4.1.1",

But now it's giving me something different while compiling itself

I run sbt/sbt clean followed by assembly and this is what I get :

[error] public: bad organisation found in http://repo1.maven.org/maven2/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom: expected='commons-daemon' found='org.apache.commons'

[info] Resolving cglib#cglib-nodep;2.2.2 ...

[warn] ::::::::::::::::::::::::::::::::::::::::::::::

[warn] :: UNRESOLVED DEPENDENCIES ::

[warn] ::::::::::::::::::::::::::::::::::::::::::::::

[warn] :: commons-daemon#commons-daemon;1.0.3: java.text.ParseException: inconsistent module descriptor file found in 'http://repo1.maven.org/maven2/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom': bad organisation: expected='commons-daemon' found='org.apache.commons';

[warn] ::::::::::::::::::::::::::::::::::::::::::::::

sbt.ResolveException: unresolved dependency: commons-daemon#commons-daemon;1.0.3: java.text.ParseException: inconsistent module descriptor file found in 'http://repo1.maven.org/maven2/commons-daemon/commons-daemon/1.0.3/commons-daemon-1.0.3.pom': bad organisation: expected='commons-daemon' found='org.apache.commons';

at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:214)

at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:122)

at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:121)

at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:117)

Any clue what's going on?

Thanks,

Vipul

Message has been deleted

Vipul

unread,

Aug 16, 2013, 7:49:56 PM8/16/13

to spark...@googlegroups.com

I got around that issue by simply downloading the jar file manually and placing it in local repository - So that's not an issue anymore - but even after all that circus I'm back at my original error :

13/08/16 16:45:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

13/08/16 16:45:51 WARN snappy.LoadSnappy: Snappy native library not loaded

[error] (run-main) java.io.IOException: No FileSystem for scheme: hdfs

java.io.IOException: No FileSystem for scheme: hdfs

at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2206)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2213)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2252)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2234)

usning sbt assembly after adding this line in the code :

"org.apache.hadoop" % "hadoop-hdfs" % "2.0.0-cdh4.1.1",

Vipul

unread,

Aug 20, 2013, 5:32:18 PM8/20/13

to spark...@googlegroups.com

any solutions anyone?

Arun Ahuja

unread,

Dec 9, 2013, 1:31:33 PM12/9/13

to spark...@googlegroups.com

Wanted to see if anyone had found a solution for this - I have tried both suggested fixes

1) In Maven add hadoop-client-2.2 and even added hadoop-hdfs-2.2

2) Added property

<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>

<description>The FileSystem for hdfs: uris.</description>

</property>

to core-site.xml.

neither seem to resolve the error:

Exception in thread "main" java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: hdfs

at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(FileOutputFormat.java:164)

at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:558)

Reply all

Reply to author

Forward