Sparkling Water example fails on 3 node Spark/YARN/HDP 2.7 cluster

333 views
Skip to first unread message

Fritz Geisler

unread,
Jun 18, 2015, 3:13:06 PM6/18/15
to h2os...@googlegroups.com

I'm fairly new to the Spark/YARN/HDP environment.  I have a first time installation with 3 servers.  d001 is the master, q001, and u004 are slaves.  there is no secondary name node configured.
I tried to run the bin/run-example.sh

./bin/run-example.sh --queue default

for sparkling-water and received the following error

15/06/18 14:21:58 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://spark...@dayrhencvu004.enterprisenet.org:47581] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/06/18 14:21:58 DEBUG component.AbstractLifeCycle: STOPPED qtp1713568869{8<=0<=0/254,5}
15/06/18 14:21:58 DEBUG component.AbstractLifeCycle: STOPPED org.spark-project.jetty.server.Server@2b0f373b
15/06/18 14:21:58 DEBUG spark.MapOutputTrackerMasterActor: [actor] received message StopMapOutputTracker from Actor[akka://sparkDriver/temp/$d]
15/06/18 14:21:58 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/06/18 14:21:58 DEBUG spark.MapOutputTrackerMasterActor: [actor] handled message (0.7 ms) StopMapOutputTracker from Actor[akka://sparkDriver/temp/$d]
15/06/18 14:21:58 INFO storage.MemoryStore: MemoryStore cleared
15/06/18 14:21:58 INFO storage.BlockManager: BlockManager stopped
15/06/18 14:21:58 DEBUG storage.BlockManagerMasterActor: [actor] received message StopBlockManagerMaster from Actor[akka://sparkDriver/temp/$e]
15/06/18 14:21:58 DEBUG storage.BlockManagerMasterActor: [actor] handled message (0.157 ms) StopBlockManagerMaster from Actor[akka://sparkDriver/temp/$e]
15/06/18 14:21:58 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/06/18 14:21:58 DEBUG scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: [actor] received message StopCoordinator from Actor[akka://sparkDriver/deadLetters]
15/06/18 14:21:58 INFO spark.SparkContext: Successfully stopped SparkContext
15/06/18 14:21:58 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped!
15/06/18 14:21:58 DEBUG scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: [actor] handled message (0.514 ms) StopCoordinator from Actor[akka://sparkDriver/deadLetters]
15/06/18 14:21:58 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/06/18 14:21:58 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/06/18 14:21:58 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/06/18 14:22:08 DEBUG ipc.Client: IPC Client (1364127192) connection to dayrhencvd001.enterprisenet.org/
10.7.53.17:8032 from jbossadm: closed
15/06/18 14:22:08 DEBUG ipc.Client: IPC Client (1364127192) connection to dayrhencvd001.enterprisenet.org/10.7.53.17:8032 from jbossadm: stopped, remaining connections 0
15/06/18 14:22:12 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
Exception in thread "main" java.lang.NullPointerException
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:544)
    at org.apache.spark.examples.h2o.AirlinesWithWeatherDemo2$.main(AirlinesWithWeatherDemo2.scala:25)
    at org.apache.spark.examples.h2o.AirlinesWithWeatherDemo2.main(AirlinesWithWeatherDemo2.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/06/18 14:22:12 DEBUG util.Utils: Shutdown hook called
15/06/18 14:22:12 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@1494b84d

However, the spark-submit example for Spark Pi succeeds when I use the following:

./bin/spark-submit --class org.apache.spark.examples.SparkPi    \
--master yarn-cluster --queue default \
--num-executors 4 --driver-memory 1g --executor-memory 1g \
--executor-cores 1 /ncvprod/sas_share_04/hadoop/spark-1.3.1-bin-hadoop2.6/lib/spark-examples-1.3.1-hadoop2.6.0.jar \
10


The versions of software I am using are as follows

RHEL 5.10
Hadoop 2.7.0
Spark bin for Hadoop 2.6.0
sparkling-water-0.2.101

I upgraded Hadoop from 2.6.0 to 2.7.0 after stumbling onto issue YARN-2414.

I configured YARN to use the Fair Scheduler after having difficulty with the user name queue; "queue : jbossadm unknown" error.  However, the sparkling water console output seems to suggest it is talking to the YarnClientSchedulerBackend

Here is yarn-site.xml...

<!-- Site specific YARN configuration properties -->
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>dayrhencvd001.enterprisenet.org</value>
    <description>Hope this is the right way to specify the resource manager</description>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
    <name>yarn.scheduler.fair.user-as-default-queue</name>
    <value>false</value>
    <description>Trying to fix message that job is submitted to unknown queue</description>
</property>
</configuration>

I'm attaching a view of the All Applications page.  Any insights are appreciated.

Michal Malohlava

unread,
Jun 18, 2015, 7:37:05 PM6/18/15
to h2os...@googlegroups.com
Hi Fritz,

the run-example.sh script is wrapper around Spark's submit script which append one additional library on Spark classpath.

Can you please run:

bin/run-example.sh AirlinesWithWeatherDemo2 --master yarn-cluster --queue default \

--num-executors 4 --driver-memory 1g --executor-memory 1g \
--executor-cores 1


Another potential problem is that AirlinesWithWeatherDemo2 is designed to be run in local environment since it is using local files and
Spark had problem with distributing them in yarn mode (version 1.2, we are using Spark provided way via SparkFiles).
I will check it with the newest version.

Michal

Dne 6/18/15 v 12:13 PM Fritz Geisler napsal(a):
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Fritz Geisler

unread,
Jun 19, 2015, 9:31:07 AM6/19/15
to h2os...@googlegroups.com, mic...@h2oai.com
Thanks, Michal.  I'm not sure if this has gotten me farther, or not.  The spark work directory records a failure very soon after starting H2O.  Also, I didn't detect any activity in YARN.

15/06/19 08:37:45 INFO executor.Executor: Finished task 2.0 in stage 3.0 (TID 235). 642 bytes result sent to driver
15/06/19 08:37:46 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[H2O Launcher thread,5,main]
java.lang.NoClassDefFoundError: com/google/common/base/Predicate
        at water.api.Schema.registerAllSchemasIfNecessary(Schema.java:659)
        at water.api.RequestServer.start(RequestServer.java:438)
        at water.H2O.finalizeRegistration(H2O.java:921)
        at water.H2OApp.register(H2OApp.java:93)
        at water.H2OApp.driver(H2OApp.java:29)
        at water.H2OApp.main(H2OApp.java:21)
        at org.apache.spark.h2o.H2OContextUtils$$anonfun$5$$anon$1.run(H2OContextUtils.scala:129)
Caused by: java.lang.ClassNotFoundException: com.google.common.base.Predicate
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 7 more
15/06/19 08:37:46 DEBUG util.Utils: Shutdown hook called
15/06/19 08:37:46 DEBUG storage.DiskBlockManager: Shutdown hook called



On Thursday, June 18, 2015 at 7:37:05 PM UTC-4, Michal Malohlava wrote:
Hi Fritz,

the run-example.sh script is wrapper around Spark's submit script which append one additional library on Spark classpath.

Can you please run:

bin/run-example.sh AirlinesWithWeatherDemo2 --master yarn-cluster --queue default \
--num-executors 4 --driver-memory 1g --executor-memory 1g \
--executor-cores 1


Another potential problem is that AirlinesWithWeatherDemo2 is designed to be run in local environment since it is using local files and
Spark had problem with distributing them in yarn mode (version 1.2, we are using Spark provided way via SparkFiles).
I will check it with the newest version.

Michal

Dne 6/18/15 v 12:13 PM Fritz Geisler napsal(a):

I'm fairly new to the Spark/YARN/HDP environment.  I have a first time installation with 3 servers.  d001 is the master, q001, and u004 are slaves.  there is no secondary name node configured.
I tried to run the bin/run-example.sh

./bin/run-example.sh --queue default

for sparkling-water and received the following error

15/06/18 14:21:58 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://...@dayrhencvu004.enterprisenet.org:47581] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
10.7.53.17:8032 from jbossadm: closed

Fritz Geisler

unread,
Jun 19, 2015, 12:09:34 PM6/19/15
to h2os...@googlegroups.com, mic...@h2oai.com
I noticed that the run-example.sh script was running on default MASTER localhost.  I made a copy of bin/run-example.sh and changed this to yarn-client.  It did spin up containers on the SLAVEs.  However, at this point the container shuts down after a few seconds.
Here is the log from q001...

2015-06-19 12:00:39,063 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1434594842515_0011_000001 (auth:SIMPLE)
2015-06-19 12:00:39,068 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1434594842515_0011_01_000001 by user jbossadm
2015-06-19 12:00:39,068 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1434594842515_0011
2015-06-19 12:00:39,068 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jbossadm     IP=10.7.53.17   OPERATION=Start Container Request       TARGET=ContainerManageImpl      RESULT=SUCCESS  APPID=application_1434594842515_0011    CONTAINERID=container_1434594842515_0011_01_000001
2015-06-19 12:00:39,069 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1434594842515_0011 transitioned from NEW to INITING
2015-06-19 12:00:39,069 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1434594842515_0011_01_000001 to application application_1434594842515_0011
2015-06-19 12:00:39,069 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1434594842515_0011 transitioned from INITING to RUNNING
2015-06-19 12:00:39,069 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from NEW to LOCALIZING
2015-06-19 12:00:39,070 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1434594842515_0011
2015-06-19 12:00:39,070 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://dayrhencvd001.enterprisenet.org:23456/user/jbossadm/.sparkStaging/application_1434594842515_0011/spark-assembly-1.3.1-hadoop2.6.0.jar transitioned from INIT to DOWNLOADING
2015-06-19 12:00:39,070 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1434594842515_0011_01_000001
2015-06-19 12:00:39,104 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /tmp/hadoop-jbossadm/nm-local-dir/nmPrivate/container_1434594842515_0011_01_000001.tokens. Credentials list:
2015-06-19 12:00:39,115 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user jbossadm
2015-06-19 12:00:39,135 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /tmp/hadoop-jbossadm/nm-local-dir/nmPrivate/container_1434594842515_0011_01_000001.tokens to /tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011/container_1434594842515_0011_01_000001.tokens
2015-06-19 12:00:39,136 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011 = file:/tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011
2015-06-19 12:00:40,666 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://dayrhencvd001.enterprisenet.org:23456/user/jbossadm/.sparkStaging/application_1434594842515_0011/spark-assembly-1.3.1-hadoop2.6.0.jar(->/tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/filecache/13/spark-assembly-1.3.1-hadoop2.6.0.jar) transitioned from DOWNLOADING to LOCALIZED
2015-06-19 12:00:40,667 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from LOCALIZING to LOCALIZED
2015-06-19 12:00:40,706 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from LOCALIZED to RUNNING
2015-06-19 12:00:40,766 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011/container_1434594842515_0011_01_000001/default_container_executor.sh]
2015-06-19 12:00:41,240 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1434594842515_0011_01_000001
2015-06-19 12:00:41,251 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 3366
2015-06-19 12:00:41,259 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 24708 for container-id container_1434594842515_0011_01_000001: 45.8 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used
2015-06-19 12:00:44,270 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 3366
2015-06-19 12:00:44,279 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 24708 for container-id container_1434594842515_0011_01_000001: 232.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used
2015-06-19 12:00:44,279 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1434594842515_0011_01_000001 has processes older than 1 iteration running over the configured limit. Limit=2254857728, current usage = 2376425472
2015-06-19 12:00:44,280 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=24708,containerID=container_1434594842515_0011_01_000001] is running beyond virtual memory limits. Current usage: 232.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1434594842515_0011_01_000001 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 24708 24706 24708 24708 (bash) 0 0 65380352 277 /bin/bash -c /ncvprod/sas_share_04/hadoop/jdk/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011/container_1434594842515_0011_01_000001/tmp '-Dspark.executor.memory=2g' '-Dspark.executor.instances=4' '-Dspark.driver.port=60710' '-Dspark.driver.memory=1G' '-Dspark.driver.appUIAddress=http://dayrhencvd001.enterprisenet.org:4040' '-Dspark.master=yarn-client' '-Dspark.yarn.queue=default' '-Dspark.fileserver.uri=http://10.7.53.17:15848' '-Dspark.executor.id=<driver>' '-Dspark.jars=file:/ncvprod/sas_share_04/hadoop/sparkling-water-0.2.101/assembly/build/libs/sparkling-water-assembly-0.2.101-all.jar,file:/ncvprod/sas_share_04/hadoop/sparkling-water-0.2.101/assembly/build/libs/sparkling-water-assembly-0.2.101-all.jar' '-Dspark.driver.host=dayrhencvd001.enterprisenet.org' '-Dspark.executor.cores=1' '-Dspark.tachyonStore.folderName=spark-c84c9583-ffb9-475b-a7f9-5c57c15ff076' '-Dspark.app.name=Sparkling Water Meetup: Use Airlines and Weather Data for delay prediction' '-Dspark.driver.extraJavaOptions=' -Dspark.yarn.app.container.log.dir=/ncvprod/sas_share_04/hadoop/hadoop-2.7.0/logs/userlogs/application_1434594842515_0011/container_1434594842515_0011_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'dayrhencvd001.enterprisenet.org:60710' --executor-memory 2048m --executor-cores 1 --num-executors  4 1> /ncvprod/sas_share_04/hadoop/hadoop-2.7.0/logs/userlogs/application_1434594842515_0011/container_1434594842515_0011_01_000001/stdout 2> /ncvprod/sas_share_04/hadoop/hadoop-2.7.0/logs/userlogs/application_1434594842515_0011/container_1434594842515_0011_01_000001/stderr
        |- 24711 24708 24708 24708 (java) 676 45 2311045120 59253 /ncvprod/sas_share_04/hadoop/jdk/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-jbossadm/nm-local-dir/usercache/jbossadm/appcache/application_1434594842515_0011/container_1434594842515_0011_01_000001/tmp -Dspark.executor.memory=2g -Dspark.executor.instances=4 -Dspark.driver.port=60710 -Dspark.driver.memory=1G -Dspark.driver.appUIAddress=http://dayrhencvd001.enterprisenet.org:4040 -Dspark.master=yarn-client -Dspark.yarn.queue=default -Dspark.fileserver.uri=http://10.7.53.17:15848 -Dspark.executor.id=<driver> -Dspark.jars=file:/ncvprod/sas_share_04/hadoop/sparkling-water-0.2.101/assembly/build/libs/sparkling-water-assembly-0.2.101-all.jar,file:/ncvprod/sas_share_04/hadoop/sparkling-water-0.2.101/assembly/build/libs/sparkling-water-assembly-0.2.101-all.jar -Dspark.driver.host=dayrhencvd001.enterprisenet.org -Dspark.executor.cores=1 -Dspark.tachyonStore.folderName=spark-c84c9583-ffb9-475b-a7f9-5c57c15ff076 -Dspark.app.name=Sparkling Water Meetup: Use Airlines and Weather Data for delay prediction -Dspark.driver.extraJavaOptions= -Dspark.yarn.app.container.log.dir=/ncvprod/sas_share_04/hadoop/hadoop-2.7.0/logs/userlogs/application_1434594842515_0011/container_1434594842515_0011_01_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg dayrhencvd001.enterprisenet.org:60710 --executor-memory 2048m --executor-cores 1 --num-executors 4

2015-06-19 12:00:44,280 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 24708
2015-06-19 12:00:44,280 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from RUNNING to KILLING
2015-06-19 12:00:44,281 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1434594842515_0011_01_000001
2015-06-19 12:00:44,289 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1434594842515_0011_01_000001 is : 143
2015-06-19 12:00:44,310 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
2015-06-19 12:00:44,311 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jbossadm     OPERATION=Container Finished - Killed   TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_1434594842515_0011    CONTAINERID=container_1434594842515_0011_01_000001
2015-06-19 12:00:44,311 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1434594842515_0011_01_000001 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2015-06-19 12:00:44,311 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1434594842515_0011_01_000001 from application application_1434594842515_0011
2015-06-19 12:00:44,311 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1434594842515_0011
2015-06-19 12:00:46,070 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1434594842515_0011_01_000001]
2015-06-19 12:00:47,282 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1434594842515_0011_01_000001
2015-06-19 12:00:51,081 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1434594842515_0011 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2015-06-19 12:00:51,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1434594842515_0011
2015-06-19 12:00:51,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1434594842515_0011 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2015-06-19 12:00:51,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1434594842515_0011, with delay of 10800 seconds

Michal Malohlava

unread,
Jun 19, 2015, 1:00:19 PM6/19/15
to Fritz Geisler, h2os...@googlegroups.com
Hi Fritz, which version of Sparkling Water are you using?
1.3.5 ? http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.3/5/index.html

michal


Dne 6/19/15 v 6:31 AM Fritz Geisler napsal(a):

Fritz Geisler

unread,
Jun 19, 2015, 1:09:41 PM6/19/15
to mic...@h2oai.com, h2os...@googlegroups.com
Whichever is in that zip file, sparkling-water-0.2.101.zip

Michal Malohlava

unread,
Jun 19, 2015, 1:22:24 PM6/19/15
to Fritz Geisler, h2os...@googlegroups.com
Ahh, good catch! I have to fix the run-examples script.

Do you have Spark logs?
Or can you share with us full yarn log collected via ' yarn logs -applicationId <application ID>`
(you can send them to my e-mail)


Thank you!
Michal



Dne 6/19/15 v 9:09 AM Fritz Geisler napsal(a):

Michal Malohlava

unread,
Jun 19, 2015, 2:18:43 PM6/19/15
to Fritz Geisler, h2os...@googlegroups.com
Fritz, can you try the newest version?

m.
Dne 6/19/15 v 10:09 AM Fritz Geisler napsal(a):
> 1.3.5 ? http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.3/5/index.html

Reply all
Reply to author
Forward
0 new messages