I noticed that the run-example.sh script was running on default MASTER
localhost. I made a copy of bin/run-example.sh and changed this to
yarn-client. It did spin up containers on the SLAVEs. However, at this
point the container shuts down after a few seconds.
Here is the log from q001...
2015-06-19 12:00:39,063 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1434594842515_0011_000001 (auth:SIMPLE)
2015-06-19 12:00:39,068 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1434594842515_0011_01_000001 by user jbossadm
2015-06-19 12:00:39,068 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1434594842515_0011
2015-06-19 12:00:39,068 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jbossadm IP=10.7.53.17 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1434594842515_0011 CONTAINERID=container_1434594842515_0011_01_000001
2015-06-19 12:00:39,069 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1434594842515_0011 transitioned from NEW to INITING
2015-06-19 12:00:39,069 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1434594842515_0011_01_000001 to application application_1434594842515_0011
2015-06-19 12:00:39,069 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1434594842515_0011 transitioned from INITING to RUNNING
2015-06-19 12:00:39,069 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from NEW to LOCALIZING
2015-06-19 12:00:39,070 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1434594842515_0011
2015-06-19 12:00:39,070 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://
dayrhencvd001.enterprisenet.org:23456/user/jbossadm/.sparkStaging/application_1434594842515_0011/spark-assembly-1.3.1-hadoop2.6.0.jar transitioned from INIT to DOWNLOADING
2015-06-19 12:00:39,070 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1434594842515_0011_01_000001
2015-06-19 12:00:39,104 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /tmp/hadoop-jbossadm/nm-local-dir/nmPrivate/container_1434594842515_0011_01_000001.tokens. Credentials list:
2015-06-19 12:00:39,115 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user jbossadm
2015-06-19 12:00:39,135 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /tmp/hadoop-jbossadm/nm-local-dir/nmPrivate/container_1434594842515_0011_01_000001.tokens to /tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011/container_1434594842515_0011_01_000001.tokens
2015-06-19 12:00:39,136 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011 = file:/tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011
2015-06-19 12:00:40,666 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://
dayrhencvd001.enterprisenet.org:23456/user/jbossadm/.sparkStaging/application_1434594842515_0011/spark-assembly-1.3.1-hadoop2.6.0.jar(->/tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/filecache/13/spark-assembly-1.3.1-hadoop2.6.0.jar) transitioned from DOWNLOADING to LOCALIZED
2015-06-19 12:00:40,667 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from LOCALIZING to LOCALIZED
2015-06-19 12:00:40,706 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from LOCALIZED to RUNNING
2015-06-19 12:00:40,766 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011/container_1434594842515_0011_01_000001/default_container_executor.sh]
2015-06-19 12:00:41,240 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1434594842515_0011_01_000001
2015-06-19 12:00:41,251 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 3366
2015-06-19 12:00:41,259 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 24708 for container-id container_1434594842515_0011_01_000001: 45.8 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used
2015-06-19 12:00:44,270 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 3366
2015-06-19 12:00:44,279 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 24708 for container-id container_1434594842515_0011_01_000001: 232.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used
2015-06-19 12:00:44,279 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1434594842515_0011_01_000001 has processes older than 1 iteration running over the configured limit. Limit=
2254857728, current usage = 2376425472
2015-06-19 12:00:44,280 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=24708,containerID=container_1434594842515_0011_01_000001] is running beyond virtual memory limits. Current usage: 232.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1434594842515_0011_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 24708 24706 24708 24708 (bash) 0 0 65380352 277 /bin/bash -c /ncvprod/sas_share_04/hadoop/jdk/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011/container_1434594842515_0011_01_000001/tmp '-Dspark.executor.memory=2g' '-Dspark.executor.instances=4' '-Dspark.driver.port=60710' '-Dspark.driver.memory=1G' '-Dspark.driver.appUIAddress=
http://dayrhencvd001.enterprisenet.org:4040' '-Dspark.master=yarn-client' '-Dspark.yarn.queue=default' '-Dspark.fileserver.uri=
http://10.7.53.17:15848' '-
Dspark.executor.id=<driver>' '-Dspark.jars=file:/ncvprod/sas_share_04/hadoop/sparkling-water-0.2.101/assembly/build/libs/sparkling-water-assembly-0.2.101-all.jar,file:/ncvprod/sas_share_04/hadoop/sparkling-water-0.2.101/assembly/build/libs/sparkling-water-assembly-0.2.101-all.jar' '-Dspark.driver.host=
dayrhencvd001.enterprisenet.org' '-Dspark.executor.cores=1' '-Dspark.tachyonStore.folderName=spark-c84c9583-ffb9-475b-a7f9-5c57c15ff076' '-
Dspark.app.name=Sparkling Water Meetup: Use Airlines and Weather Data for delay prediction' '-Dspark.driver.extraJavaOptions=' -Dspark.yarn.app.container.log.dir=/ncvprod/sas_share_04/hadoop/hadoop-2.7.0/logs/userlogs/application_1434594842515_0011/container_1434594842515_0011_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '
dayrhencvd001.enterprisenet.org:60710' --executor-memory 2048m --executor-cores 1 --num-executors 4 1> /ncvprod/sas_share_04/hadoop/hadoop-2.7.0/logs/userlogs/application_1434594842515_0011/container_1434594842515_0011_01_000001/stdout 2> /ncvprod/sas_share_04/hadoop/hadoop-2.7.0/logs/userlogs/application_1434594842515_0011/container_1434594842515_0011_01_000001/stderr
|- 24711 24708 24708 24708 (java) 676 45 2311045120 59253 /ncvprod/sas_share_04/hadoop/jdk/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011/container_1434594842515_0011_01_000001/tmp -Dspark.executor.memory=2g -Dspark.executor.instances=4 -Dspark.driver.port=60710 -Dspark.driver.memory=1G -Dspark.driver.appUIAddress=
http://dayrhencvd001.enterprisenet.org:4040 -Dspark.master=yarn-client -Dspark.yarn.queue=default -Dspark.fileserver.uri=
http://10.7.53.17:15848 -
Dspark.executor.id=<driver> -Dspark.jars=file:/ncvprod/sas_share_04/hadoop/sparkling-water-0.2.101/assembly/build/libs/sparkling-water-assembly-0.2.101-all.jar,file:/ncvprod/sas_share_04/hadoop/sparkling-water-0.2.101/assembly/build/libs/sparkling-water-assembly-0.2.101-all.jar -Dspark.driver.host=
dayrhencvd001.enterprisenet.org -Dspark.executor.cores=1 -Dspark.tachyonStore.folderName=spark-c84c9583-ffb9-475b-a7f9-5c57c15ff076 -
Dspark.app.name=Sparkling Water Meetup: Use Airlines and Weather Data for delay prediction -Dspark.driver.extraJavaOptions= -Dspark.yarn.app.container.log.dir=/ncvprod/sas_share_04/hadoop/hadoop-2.7.0/logs/userlogs/application_1434594842515_0011/container_1434594842515_0011_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg
dayrhencvd001.enterprisenet.org:60710 --executor-memory 2048m --executor-cores 1 --num-executors 4
2015-06-19 12:00:44,280 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 24708
2015-06-19 12:00:44,280 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from RUNNING to KILLING
2015-06-19 12:00:44,281 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1434594842515_0011_01_000001
2015-06-19 12:00:44,289 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1434594842515_0011_01_000001 is : 143
2015-06-19 12:00:44,310 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
2015-06-19 12:00:44,311 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jbossadm OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1434594842515_0011 CONTAINERID=container_1434594842515_0011_01_000001
2015-06-19 12:00:44,311 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2015-06-19 12:00:44,311 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1434594842515_0011_01_000001 from application application_1434594842515_0011
2015-06-19 12:00:44,311 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1434594842515_0011
2015-06-19 12:00:46,070 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1434594842515_0011_01_000001]
2015-06-19 12:00:47,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1434594842515_0011_01_000001
2015-06-19 12:00:51,081 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1434594842515_0011 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2015-06-19 12:00:51,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1434594842515_0011
2015-06-19 12:00:51,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1434594842515_0011 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2015-06-19 12:00:51,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1434594842515_0011, with delay of 10800 seconds