how to properly setup spark on yarn cluster?

6,968 views
Skip to first unread message

Alexander Guzhva

unread,
May 10, 2013, 5:34:01 PM5/10/13
to spark...@googlegroups.com
Hi, guys, it looks that I'm doing something wrong:

1) There are two Fedora virtual machines under virtualbox, one is master and the other is slave.

2) /etc/hosts looks the following:
127.0.0.1 localhost
::1 localhost
master_ip master
slave_ip slave

3) hadoop 2.0.4-alpha can run standard examples from distr with no problems, so ssh is ok

4) scala 2.9.3, spark 0.7.0 were installed

5) .bashrc contains exports of JAVA_HOME, hadoop stuff, SCALA_HOME, SPARK_HOME

6) "./run spark.examples.SparkPi local" is fine

7) sbt clean compile (modified project/SparkBuild.scala to match yarn), then assembly, then package - is fine, 0.8.0 version files are produced

8) So, trying to run an example for a non-standalone mode.
This command fails:

[hadoop@master master]$ SPARK_JAR=./core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar ./run spark.deploy.yarn.Client   --jar examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar   --class spark.examples.SparkPi --args standalone

also, two files with names like 19app.jar and 19spark.jar are created every time in /home/hadoop/spark (which is not a default SPARK home, it's /home/hadoop/SPARK/master/)

13/05/10 17:26:10 INFO AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/05/10 17:26:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/10 17:26:11 INFO AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/05/10 17:26:11 INFO Client: Got Cluster metric info from ASM, numNodeManagers=1
13/05/10 17:26:11 INFO Client: Queue info .. queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=15, queueChildQueueCount=0
13/05/10 17:26:11 INFO Client: Max mem capabililty of resources in this cluster 8192
13/05/10 17:26:11 INFO Client: Setting up application submission context for ASM
13/05/10 17:26:11 INFO Client: Preparing Local resources
13/05/10 17:26:11 INFO Client: Uploading core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar to file:/home/hadoop/spark/19spark.jar
13/05/10 17:26:11 INFO Client: Uploading examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar to file:/home/hadoop/spark/19app.jar
13/05/10 17:26:11 INFO Client: Setting up the launch environment
13/05/10 17:26:11 INFO Client: Setting up container launch context
13/05/10 17:26:11 INFO Client: Command for the ApplicationMaster: java  -server -Xmx640m  spark.deploy.yarn.ApplicationMaster --class spark.examples.SparkPi --jar examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar --args  'standalone'  --worker-memory 1024 --worker-cores 1 --num-workers 2 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
13/05/10 17:26:11 INFO Client: Submitting application to ASM
13/05/10 17:26:12 INFO YarnClientImpl: Submitted application application_1368199037869_0019 to ResourceManager at /0.0.0.0:8032
13/05/10 17:26:13 INFO Client: Application report from ASM:
     application identifier: application_1368199037869_0019
     appId: 19
     clientToken: null
     appDiagnostics:
     appMasterHost: N/A
     appQueue: default
     appMasterRpcPort: 0
     appStartTime: 1368221172004
     yarnAppState: ACCEPTED
     distributedFinalState: UNDEFINED
     appTrackingUrl: master:8088/proxy/application_1368199037869_0019/
     appUser: hadoop
13/05/10 17:26:14 INFO Client: Application report from ASM:
     application identifier: application_1368199037869_0019
     appId: 19
     clientToken: null
     appDiagnostics: Application application_1368199037869_0019 failed 1 times due to AM Container for appattempt_1368199037869_0019_000001 exited with  exitCode: -1000 due to: RemoteTrace:
java.io.FileNotFoundException: File file:/home/hadoop/spark/19app.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:492)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:395)
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51)
    at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284)
    at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:280)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)
 at LocalTrace:
    org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: File file:/home/hadoop/spark/19app.jar does not exist
    at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
    at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:819)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:491)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:218)
    at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
    at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

.Failing this attempt.. Failing the application.
     appMasterHost: N/A
     appQueue: default
     appMasterRpcPort: 0
     appStartTime: 1368221172004
     yarnAppState: FAILED
     distributedFinalState: FAILED
     appTrackingUrl: master:8088/proxy/application_1368199037869_0019/
     appUser: hadoop


What should I modify in a system configuration?

Thanks.


Arun Ahuja

unread,
May 10, 2013, 6:03:50 PM5/10/13
to spark...@googlegroups.com
Same issue! sent a question out the mailing list early and no replies so hopefully you get some help!

Arun





--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Mridul Muralidharan

unread,
May 11, 2013, 1:27:36 AM5/11/13
to spark...@googlegroups.com


Hmm, looks like my PR skipped adding HADOOP_CONF_DIR to the CLASSPATH - my apologies !
Essentially, spark is not connecting to the yarn cluster - but trying to run it in local mode.

To test, you can try setting SPARK_CLASSPATH to your yarn configuration directory : to see if it is able to connect to the cluster.
I will update my pull request with this change - so that when merged [1], this should get fixed.


Regards,
Mridul

[1]

Mridul Muralidharan

unread,
May 11, 2013, 1:45:34 AM5/11/13
to spark...@googlegroups.com

Missed link : you can try to merge this PR and test it out.

[1] https://github.com/mesos/spark/pull/589

Alexander Guzhva

unread,
May 13, 2013, 11:29:05 AM5/13/13
to spark...@googlegroups.com
Thanks, Mridul.
Tried this pull. Maybe I did not merge it correctly, but something is still wrong (maybe I should wait for updates in a master branch). I do see spark jar files in hdfs, I am able to start sparkpi example, but it  fails


[hadoop@master master]$ SPARK_JAR=./core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar ./run spark.deploy.yarn.Client   --jar examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar   --class spark.examples.SparkPi
13/05/13 11:20:18 INFO AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/05/13 11:20:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/13 11:20:18 INFO AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/05/13 11:20:19 INFO Client: Got Cluster metric info from ASM, numNodeManagers=1
13/05/13 11:20:19 INFO Client: Queue info .. queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
13/05/13 11:20:19 INFO Client: Max mem capabililty of resources in this cluster 8192
13/05/13 11:20:19 INFO Client: Setting up application submission context for ASM
13/05/13 11:20:19 INFO Client: Preparing Local resources
13/05/13 11:20:19 INFO Client: Uploading core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar to hdfs://master:9000/user/hadoop/spark/1spark.jar
13/05/13 11:20:21 INFO Client: Uploading examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar to hdfs://master:9000/user/hadoop/spark/1app.jar
13/05/13 11:20:21 INFO Client: Setting up the launch environment
13/05/13 11:20:21 INFO Client: Setting up container launch context
13/05/13 11:20:21 INFO Client: Command for the ApplicationMaster: java  -server -Xmx640m  spark.deploy.yarn.ApplicationMaster --class spark.examples.SparkPi --jar examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar --worker-memory 1024 --worker-cores 1 --num-workers 2 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
13/05/13 11:20:21 INFO Client: Submitting application to ASM
13/05/13 11:20:21 INFO YarnClientImpl: Submitted application application_1368458375091_0001 to ResourceManager at master/192.168.56.101:8032
13/05/13 11:20:22 INFO Client: Application report from ASM:
     application identifier: application_1368458375091_0001
     appId: 1

     clientToken: null
     appDiagnostics:
     appMasterHost: N/A
     appQueue: default
     appMasterRpcPort: 0
     appStartTime: 1368458421299
     yarnAppState: ACCEPTED
     distributedFinalState: UNDEFINED
     appTrackingUrl: master:8088/proxy/application_1368458375091_0001/
     appUser: hadoop
13/05/13 11:20:23 INFO Client: Application report from ASM:
     application identifier: application_1368458375091_0001
     appId: 1

     clientToken: null
     appDiagnostics:
     appMasterHost: N/A
     appQueue: default
     appMasterRpcPort: 0
     appStartTime: 1368458421299
     yarnAppState: ACCEPTED
     distributedFinalState: UNDEFINED
     appTrackingUrl: master:8088/proxy/application_1368458375091_0001/
     appUser: hadoop
13/05/13 11:20:24 INFO Client: Application report from ASM:
     application identifier: application_1368458375091_0001
     appId: 1

     clientToken: null
     appDiagnostics:
     appMasterHost: N/A
     appQueue: default
     appMasterRpcPort: 0
     appStartTime: 1368458421299
     yarnAppState: ACCEPTED
     distributedFinalState: UNDEFINED
     appTrackingUrl: master:8088/proxy/application_1368458375091_0001/
     appUser: hadoop
13/05/13 11:20:25 INFO Client: Application report from ASM:
     application identifier: application_1368458375091_0001
     appId: 1

     clientToken: null
     appDiagnostics:
     appMasterHost: N/A
     appQueue: default
     appMasterRpcPort: 0
     appStartTime: 1368458421299
     yarnAppState: ACCEPTED
     distributedFinalState: UNDEFINED
     appTrackingUrl: master:8088/proxy/application_1368458375091_0001/
     appUser: hadoop
13/05/13 11:20:26 INFO Client: Application report from ASM:
     application identifier: application_1368458375091_0001
     appId: 1
     clientToken: null
     appDiagnostics: Application application_1368458375091_0001 failed 1 times due to AM Container for appattempt_1368458375091_0001_000001 exited with  exitCode: 1 due to:
.Failing this attempt.. Failing the application.
     appMasterHost: master
     appQueue: default
     appMasterRpcPort: 0
     appStartTime: 1368458421299
     yarnAppState: FAILED
     distributedFinalState: FAILED
     appTrackingUrl: master:8088/cluster/app/application_1368458375091_0001
     appUser: hadoop
[hadoop@master master]$

Hadoop logs contains the following:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.0.4-alpha/tmp/nm-local-dir/usercache/hadoop/appcache/application_1368458375091_0001/filecache/6622766977966537636/1spark.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.0.4-alpha/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/05/13 11:20:25 INFO yarn.ApplicationMaster: running as user hadoop
13/05/13 11:20:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/13 11:20:25 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1368458375091_0001_000001
13/05/13 11:20:25 INFO yarn.ApplicationMaster: Connecting to ResourceManager at master/192.168.56.101:8030
13/05/13 11:20:25 INFO yarn.ApplicationMaster: Registering the ApplicationMaster
13/05/13 11:20:25 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread
13/05/13 11:20:25 INFO yarn.ApplicationMaster: Waiting for spark driver to be reachable.
13/05/13 11:20:25 ERROR yarn.ApplicationMaster: Failed to connect to driver at null:null
Usage: SparkPi <master> [<slices>]

Arun Ahuja

unread,
May 15, 2013, 4:01:22 PM5/15/13
to spark...@googlegroups.com
Yes, I'm also still running into issues.  I pulled the lastest master and tried with that as well, but see the same errors as above.

I'm also seeing some conflicting documentation, I found this: https://github.com/mesos/spark/blob/master/docs/running-on-yarn.md

But the command mvn -Phadoop2-yarn clean install doesn't seem to work.

So the I still do the sbt packaging, but not sure if this part is necessary :"Please comment out the HADOOP_VERSION, HADOOP_MAJOR_VERSION and HADOOP_YARN"  ... tried that as well, but didn't seem to make a difference.  The error doesn't seem to make sense.  It wanted to copy the app jar to some random jar i.e 18app.jar, and says it doesn't exist, but a simple ls shows that it does



Mridul Muralidharan

unread,
May 16, 2013, 2:12:01 AM5/16/13
to spark...@googlegroups.com


What was the command used to execute SparkPi example ?

Regards
Mridul

Mridul Muralidharan

unread,
May 16, 2013, 8:22:46 AM5/16/13
to spark...@googlegroups.com
Looks like there were a few inconsistencies in the spark on yarn
documentation which slipped our attention.
Hopefully they are resolved in this PR : https://github.com/mesos/spark/pull/614

Can you try with the changes there ?
With a clean spark master, with that PR merged, I got the SparkPi
example working fine.


After maven install, I did this :

SPARK_JAR=./repl-bin/target/spark-repl-bin-0.8.0-SNAPSHOT-shaded-hadoop2-yarn.jar
./run spark.deploy.yarn.Client --jar
./examples/target/spark-examples-0.8.0-SNAPSHOT-hadoop2-yarn.jar
--class spark.examples.SparkPi --args yarn-standalone
--num-workers 3 --master-memory 4g --worker-memory 2g
--worker-cores 1

and it worked fine.


Regards,
Mridul

Alexander Guzhva

unread,
May 16, 2013, 1:49:46 PM5/16/13
to spark...@googlegroups.com
1) removed old spark sources
2) downloaded new ones through git clone
3) set to hadoop 2.0.4-alpha in project/SparkBuild.scala
4) took a look to pull 614
5) sbt clean compile
6) sbt assembly
7) sbt package
8) run
SPARK_JAR=./repl/target/scala-2.9.3/spark-repl_2.9.3-0.8.0-SNAPSHOT.jar ./run spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar --class spark.examples.SparkPi --args yarn-standalone

...
13/05/16 13:06:34 INFO yarn.Client: Application report from ASM:
     application identifier: application_1368540452662_0056
     appId: 56
     clientToken: null
     appDiagnostics: Application application_1368540452662_0056 failed 1 times due to AM Container for appattempt_1368540452662_0056_000001 exited with  exitCode: 1 due to:
.Failing this attempt.. Failing the application.
     appMasterHost: N/A
     appQueue: default
     appMasterRpcPort: 0
     appStartTime: 1368723992214
     yarnAppState: FAILED
     distributedFinalState: FAILED
     appTrackingUrl: master:8088/proxy/application_1368540452662_0056/
     appUser: hadoop

hadoop's log file says in stderr file:
Error: Could not find or load main class spark.deploy.yarn.ApplicationMaster



hmm, ./repl-bin/ contains an invalid file
ok, trying something else
8.1)
[hadoop@master master]$ SPARK_JAR=./core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar ./run spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar --class spark.examples.SparkPi
13/05/16 13:36:39 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/05/16 13:36:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/16 13:36:39 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/05/16 13:36:39 INFO yarn.Client: Got Cluster metric info from ASM, numNodeManagers=1
13/05/16 13:36:39 INFO yarn.Client: Queue info .. queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=54, queueChildQueueCount=0
13/05/16 13:36:39 INFO yarn.Client: Max mem capabililty of resources in this cluster 8192
13/05/16 13:36:39 INFO yarn.Client: Setting up application submission context for ASM
13/05/16 13:36:39 INFO yarn.Client: Preparing Local resources
13/05/16 13:36:40 INFO yarn.Client: Uploading core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar to hdfs://master:9000/user/hadoop/spark/58spark.jar
13/05/16 13:36:41 INFO yarn.Client: Uploading examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar to hdfs://master:9000/user/hadoop/spark/58app.jar
13/05/16 13:36:41 INFO yarn.Client: Setting up the launch environment
13/05/16 13:36:41 INFO yarn.Client: Setting up container launch context
13/05/16 13:36:41 INFO yarn.Client: Command for the ApplicationMaster: java -server -Xmx640m spark.deploy.yarn.ApplicationMaster --class spark.examples.SparkPi --jar ./examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.0-SNAPSHOT.jar --worker-memory 1024 --worker-cores 1 --num-workers 2 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
13/05/16 13:36:41 INFO yarn.Client: Submitting application to ASM
13/05/16 13:36:41 INFO client.YarnClientImpl: Submitted application application_1368540452662_0058 to ResourceManager at master/192.168.56.101:8032
13/05/16 13:36:42 INFO yarn.Client: Application report from ASM:
application identifier: application_1368540452662_0058
appId: 58

clientToken: null
appDiagnostics:
appMasterHost: N/A
appQueue: default
appMasterRpcPort: 0
	 appStartTime: 1368725801935
yarnAppState: ACCEPTED
distributedFinalState: UNDEFINED
appTrackingUrl: master:8088/proxy/application_1368540452662_0058/
appUser: hadoop
13/05/16 13:36:43 INFO yarn.Client: Application report from ASM:
application identifier: application_1368540452662_0058
appId: 58

clientToken: null
appDiagnostics:
appMasterHost: N/A
appQueue: default
appMasterRpcPort: 0
	 appStartTime: 1368725801935
yarnAppState: ACCEPTED
distributedFinalState: UNDEFINED
appTrackingUrl: master:8088/proxy/application_1368540452662_0058/
appUser: hadoop
13/05/16 13:36:44 INFO yarn.Client: Application report from ASM:
application identifier: application_1368540452662_0058
appId: 58
clientToken: null
appDiagnostics:

appMasterHost: master
appQueue: default
appMasterRpcPort: 0
	 appStartTime: 1368725801935
yarnAppState: RUNNING
distributedFinalState: UNDEFINED
appTrackingUrl: master:8088/proxy/application_1368540452662_0058/
appUser: hadoop
13/05/16 13:36:45 INFO yarn.Client: Application report from ASM:
application identifier: application_1368540452662_0058
appId: 58
clientToken: null
appDiagnostics: Application application_1368540452662_0058 failed 1 times due to AM Container for appattempt_1368540452662_0058_000001 exited with exitCode: 1 due to:

.Failing this attempt.. Failing the application.
appMasterHost: master
appQueue: default
appMasterRpcPort: 0
	 appStartTime: 1368725801935
yarnAppState: FAILED
distributedFinalState: FAILED
appTrackingUrl: master:8088/cluster/app/application_1368540452662_0058
appUser: hadoop

hadoop says:

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.0.4-alpha/tmp/nm-local-dir/usercache/hadoop/appcache/application_1368540452662_0058/filecache/-8637636619787811316/58spark.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.0.4-alpha/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 13/05/16 13:36:43 INFO yarn.ApplicationMaster: running as user hadoop 13/05/16 13:36:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/05/16 13:36:44 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1368540452662_0058_000001 13/05/16 13:36:44 INFO yarn.ApplicationMaster: Connecting to ResourceManager at master/192.168.56.101:8030 13/05/16 13:36:44 INFO yarn.ApplicationMaster: Registering the ApplicationMaster 13/05/16 13:36:44 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread 13/05/16 13:36:44 INFO yarn.ApplicationMaster: Waiting for spark driver to be reachable. 13/05/16 13:36:44 ERROR yarn.ApplicationMaster: Failed to connect to driver at null:null Usage: SparkPi <master> [<slices>]

8.2) ok, adding --args yarn-standalone to command line


failure, hadoop log file says:

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.0.4-alpha/tmp/nm-local-dir/usercache/hadoop/appcache/application_1368540452662_0059/filecache/9179936078225164096/59spark.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.0.4-alpha/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 13/05/16 13:42:31 INFO yarn.ApplicationMaster: running as user hadoop 13/05/16 13:42:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/05/16 13:42:31 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1368540452662_0059_000001 13/05/16 13:42:31 INFO yarn.ApplicationMaster: Connecting to ResourceManager at master/192.168.56.101:8030 13/05/16 13:42:31 INFO yarn.ApplicationMaster: Registering the ApplicationMaster 13/05/16 13:42:31 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread 13/05/16 13:42:31 INFO yarn.ApplicationMaster: Waiting for spark driver to be reachable. 13/05/16 13:42:31 ERROR yarn.ApplicationMaster: Failed to connect to driver at null:null 13/05/16 13:42:31 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0 13/05/16 13:42:32 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0 13/05/16 13:42:32 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0 13/05/16 13:42:32 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0 13/05/16 13:42:32 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started 13/05/16 13:42:32 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0 13/05/16 13:42:32 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0 Exception in thread "Thread-2" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:154) Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: master/192.168.56.101:0 at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:297) at akka.remote.netty.NettyRemoteServer.start(Server.scala:53) at akka.remote.netty.NettyRemoteTransport.start(NettyRemoteSupport.scala:89) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:94) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:588) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:595) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) at spark.SparkEnv$.createFromSystemProperties(SparkEnv.scala:83) at spark.SparkContext.<init>(SparkContext.scala:85) at spark.examples.SparkPi$.main(SparkPi.scala:14) at spark.examples.SparkPi.main(SparkPi.scala) ... 5 more Caused by: java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:344) at sun.nio.ch.Net.bind(Net.java:336) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:199) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.bind(NioServerSocketPipelineSink.java:140) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleServerSocket(NioServerSocketPipelineSink.java:90) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:64) at org.jboss.netty.channel.Channels.bind(Channels.java:569) at org.jboss.netty.channel.AbstractChannel.bind(AbstractChannel.java:189) at org.jboss.netty.bootstrap.ServerBootstrap$Binder.channelOpen(ServerBootstrap.java:342) at org.jboss.netty.channel.Channels.fireChannelOpen(Channels.java:170) at org.jboss.netty.channel.socket.nio.NioServerSocketChannel.<init>(NioServerSocketChannel.java:80) at org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory.newChannel(NioServerSocketChannelFactory.java:158) at org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory.newChannel(NioServerSocketChannelFactory.java:86) at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:276) ... 16 more

ok then,
9) install maven
10) mvn -Phadoop2-yarn clean install -DskipTests=true

error:
/home/hadoop/spark/master/core/src/main/scala/spark/network/netty/FileHeader.scala
lines 3, 5, 6, 7: object netty is not a member of package io
the same for FileClientHandler.scala and BlockFetcherIterator.scala

so, it does not compile

Mridul Muralidharan

unread,
May 16, 2013, 2:41:32 PM5/16/13
to spark...@googlegroups.com

Looks like you did not edit SparkBuild.scala
So it did not include yar  code ...

Use of Maven is preferable since it forces us to choose profile while not requiring code changes.

Regards
Mridul

Alexander Guzhva

unread,
May 16, 2013, 3:42:30 PM5/16/13
to spark...@googlegroups.com
Hi Mridul,

My SparkBuild.scala:

import sbt._
import sbt.Classpaths.publishTask
import Keys._
import sbtassembly.Plugin._
import AssemblyKeys._
import twirl.sbt.TwirlPlugin._
// For Sonatype publishing
//import com.jsuereth.pgp.sbtplugin.PgpKeys._

object SparkBuild extends Build {
  // Hadoop version to build against. For example, "0.20.2", "0.20.205.0", or
  // "1.0.4" for Apache releases, or "0.20.2-cdh3u5" for Cloudera Hadoop.
  //val HADOOP_VERSION = "1.0.4"
  //val HADOOP_MAJOR_VERSION = "1"
  //val HADOOP_YARN = false

  // For Hadoop 2 versions such as "2.0.0-mr1-cdh4.1.1", set the HADOOP_MAJOR_VERSION to "2"
  //val HADOOP_VERSION = "2.0.0-mr1-cdh4.1.1"
  //val HADOOP_MAJOR_VERSION = "2"
  //val HADOOP_YARN = false

  // For Hadoop 2 YARN support
  val HADOOP_VERSION = "2.0.4-alpha"
  val HADOOP_MAJOR_VERSION = "2"
  val HADOOP_YARN = true

...


2013/5/16 Mridul Muralidharan <mri...@gmail.com>

--
You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/itp089Xpglo/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to spark-users...@googlegroups.com.

Mridul Muralidharan

unread,
May 16, 2013, 4:07:16 PM5/16/13
to spark...@googlegroups.com
Hi,

Did not check the command line you were using - I think the
SPARK_JAR you are using, for sbt, might not be correct.
It should, typically, look something like
./core/target/spark-core-assembly-0.8.0-SNAPSHOT.jar


$ find . -type f -name '*spark*core*assembly*.jar'

should give you the right jar to pick ... for sbt.


For maven build, it would typically be

$ find . -type f -name '*spark*shade*.jar'


The assembled jar has all the dependencies merged into it - so
typically will be 'large' : 50 mb or higher.

Regards,
Mridul

PS: Your versions numbers above might be different ....

Alexander Guzhva

unread,
May 16, 2013, 4:46:23 PM5/16/13
to spark...@googlegroups.com
Hi again,

I do see .jar file in the right place

Is it possible to know what is the source of this "Failed to connect to driver at master:0" error? What is the source of this Zero, what value is missing or not being read?
----
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.0.4-alpha/tmp/nm-local-dir/usercache/hadoop/appcache/application_1368540452662_0059/filecache/9179936078225164096/59spark.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/hadoop-2.0.4-alpha/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/05/16 13:42:31 INFO yarn.ApplicationMaster: running as user hadoop
13/05/16 13:42:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/16 13:42:31 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1368540452662_0059_000001
13/05/16 13:42:31 INFO yarn.ApplicationMaster: Connecting to ResourceManager at master/192.168.56.101:8030
13/05/16 13:42:31 INFO yarn.ApplicationMaster: Registering the ApplicationMaster
13/05/16 13:42:31 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread
13/05/16 13:42:31 INFO yarn.ApplicationMaster: Waiting for spark driver to be reachable.
13/05/16 13:42:31 ERROR yarn.ApplicationMaster: Failed to connect to driver at null:null
13/05/16 13:42:31 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0
13/05/16 13:42:32 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0
13/05/16 13:42:32 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0
13/05/16 13:42:32 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0



Mridul Muralidharan

unread,
May 16, 2013, 4:59:31 PM5/16/13
to spark...@googlegroups.com
That is part of waiting for spark master to initialize ...

We try to connect to driver to initialize itself : initially driver
host = null (not set) and port = 0 (bind and find port).
And as the initialization proceeds, the values get set - until the
init completes, and you see a message like :

"Master now available: <host>:<port>"


I hope you saw that !
Note that as part of running a yarn job, unlike other spark modes, the
stdout/stderr goes to the master's stdout/stderr file (typically,
copied to hdfs - based on your config).
So you will need to grep that for finding out the result ... I guess
you already know that, given the messages you are retrieving.

Regards,
Mridul



On Fri, May 17, 2013 at 2:16 AM, Alexander Guzhva

Alexander Guzhva

unread,
May 16, 2013, 5:26:09 PM5/16/13
to spark...@googlegroups.com
Unfortunately, I did not see "Master now available: <host>:<port>"
only messages like this one for many minutes
13/05/16 13:42:31 ERROR yarn.ApplicationMaster: Failed to connect to driver at master:0



2013/5/16 Mridul Muralidharan <mri...@gmail.com>

Mridul Muralidharan

unread,
May 16, 2013, 5:27:19 PM5/16/13
to spark...@googlegroups.com
Others can comment better - but this does look like a resolution issue.

- Mridul


On Fri, May 17, 2013 at 2:56 AM, Alexander Guzhva

Arun Ahuja

unread,
Jun 7, 2013, 5:05:19 PM6/7/13
to spark...@googlegroups.com
Anyone have luck or more detailed documentation from above.  Even after all the advice I still seem to have the same error where it attempts to copy an app jar and then it says it does not exist:

ie..e

 at LocalTrace: 
    org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: File file:/home/hadoop/spark/19app.jar does not exist
    at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
    at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:819)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:491)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:218)
    at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
    at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

Alexander Guzhva

unread,
Jun 10, 2013, 9:30:26 AM6/10/13
to spark...@googlegroups.com
I still cannot start a jar properly because of 0.0.0.0


2013/6/7 Arun Ahuja <aahu...@gmail.com>
Reply all
Reply to author
Forward
0 new messages