java.lang.RuntimeException: Cloud size under 50

Jonghoon Joey Ahnn

unread,

Mar 16, 2016, 3:24:39 AM3/16/16

to H2O Open Source Scalable Machine Learning - h2ostream

I am running sparkling water 1.5.3 on HDP 2.2.4.Someitme (not always) I am getting the following error.

Any idea on the error related to "cloud size under 50" I got?

===================================================

bash-4.1$ ./sparkling-water-shell.sh

-----

Spark master (MASTER) : local-cluster[10,2,8192]

Spark home (SPARK_HOME) : /userapps/hadoop/spark-1.6.0

H2O build version : 3.2.0.9 (slater)

Spark build version : 1.5.0

----

Welcome to

____ __

/ __/__ ___ _____/ /__

_\ \/ _ \/ _ `/ __/ '_/

/___/ .__/\_,_/_/ /_/\_\ version 1.6.0

/_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)

Type in expressions to have them evaluated.

Type :help for more information.

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

Spark context available as sc.

16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

16/03/16 02:17:26 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0

16/03/16 02:17:26 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

SQL context available as sqlContext.

Loading /home_dir/svdfe001/var/ts-features-wf/test.scala...

import com.typesafe.config._

import org.apache.spark.{SparkConf, SparkContext}

import org.apache.spark.SparkContext._

import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.sql.functions._

import org.apache.spark.sql.DataFrameNaFunctions

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.FileSystem

import org.apache.hadoop.fs.Path

import org.apache.spark.sql.hive.orc._

import org.apache.spark.sql._

import org.apache.spark.h2o._

import water._

import org.apache.spark.sql.expressions.Window

import org.apache.spark.sql.DataFrame

import org.apache.spark.sql.expressions.WindowSpec

import org.apache.spark.sql.Column

import org.apache.spark.sql.Row

16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

16/03/16 02:17:48 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist

16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.heapsize does not exist

16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist

16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist

16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist

sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@7b919e50

import sqlContext.implicits._

16/03/16 02:17:55 WARN H2OContext: Increasing 'spark.locality.wait' to value 30000

java.lang.RuntimeException: Cloud size under 50

at water.H2O.waitForCloudSize(H2O.java:1374)

at org.apache.spark.h2o.H2OContext.start(H2OContext.scala:154)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:64)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:75)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:77)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:79)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:81)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:83)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:85)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:87)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:89)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:91)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:93)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:95)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:97)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:99)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:101)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:103)

at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:105)

at $iwC$$iwC$$iwC$$iwC.<init>(<console>:107)

at $iwC$$iwC$$iwC.<init>(<console>:109)

at $iwC$$iwC.<init>(<console>:111)

at $iwC.<init>(<console>:113)

at <init>(<console>:115)

at .<init>(<console>:119)

at .<clinit>(<console>)

at .<init>(<console>:7)

at .<clinit>(<console>)

at $print(<console>)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)

at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)

at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)

at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)

at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)

at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)

at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)

at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:680)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:677)

at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:104)

at scala.reflect.io.File.applyReader(File.scala:82)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:677)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$savingReplayStack(SparkILoop.scala:162)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply$mcV$sp(SparkILoop.scala:676)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676)

at org.apache.spark.repl.SparkILoop.savingReader(SparkILoop.scala:167)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$interpretAllFrom(SparkILoop.scala:675)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:740)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:739)

at org.apache.spark.repl.SparkILoop.withFile(SparkILoop.scala:733)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loadCommand(SparkILoop.scala:739)

at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:344)

at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:81)

at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:809)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadFiles$1.apply(SparkILoop.scala:910)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadFiles$1.apply(SparkILoop.scala:908)

at scala.collection.immutable.List.foreach(List.scala:318)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loadFiles(SparkILoop.scala:908)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:995)

at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)

at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)

at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)

at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Michal Malohlava

unread,

Mar 16, 2016, 4:48:55 PM3/16/16

to h2os...@googlegroups.com

Hi Joey,

the problem is that H2O cannot figure out topology of Spark cluster.

You should:
- use Sparkling Water version compatible with Spark 1.6 (see http://www.h2o.ai/download/sparkling-water/choose or http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/1/index.html) or Spark 1.5 if you need to use the older version of Sparkling Water

- look into Sparkling Water tuning guide: https://github.com/h2oai/sparkling-water/blob/master/DEVEL.md#SparklingWaterTuning

- i would recommend to pass the following configuration option to your job:
`--conf spark.scheduler.minRegisteredResourcesRatio=1`. It forces Spark to wait till all resources are available what helps H2O with detecting cloud topology.

Please let us know if it helps!
We are working on redesigning clouding strategy, so any feedback is welcomed.

Thank you!
michal

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonghoon Joey Ahnn

unread,

Mar 16, 2016, 5:58:46 PM3/16/16

to H2O Open Source Scalable Machine Learning - h2ostream, mic...@h2oai.com

Hi Michal,

I applied the spark property you suggested and changed to the spark 1.5 with sparkling-water 1.5.3.

Unfortunately, I am getting the same cloud size error. It seems either spark.scheduler.maxRegisteredResourcesWaitingTime or spark.scheduler.minRegisteredResourcesRatio does' helps to resolve the issue.

My script is following,

------------------------------

bin/sparkling-shell -i /home_dir/svdfe001/var/ts-features-wf/tmp/test.scala \

--master yarn \

--conf spark.executor.memory=16g \

--conf spark.driver.memory=16g \

--conf spark.driver.maxResultSize=8g \

--conf spark.executor.instances=200 \

--conf spark.yarn.queue=analysis \

--conf spark.scheduler.maxRegisteredResourcesWaitingTime=1000000 \

--conf spark.scheduler.minRegisteredResourcesRatio=1

------------------------------

The error log looks like,

-----

Spark master (MASTER) : local-cluster[10,2,8192]

Spark home (SPARK_HOME) : /userapps/hadoop/spark-1.5.0-rc3

H2O build version : 3.2.0.9 (slater)

Spark build version : 1.5.0

----

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)

Type in expressions to have them evaluated.

Type :help for more information.

16/03/16 16:46:49 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.

Spark context available as sc.

16/03/16 16:47:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/03/16 16:47:39 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

SQL context available as sqlContext.

Loading /home_dir/svdfe001/var/ts-features-wf/tmp/test.scala...

import com.typesafe.config._

import org.apache.spark.{SparkConf, SparkContext}

import org.apache.spark.SparkContext._

import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.sql.functions._

import org.apache.spark.sql.DataFrameNaFunctions

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.FileSystem

import org.apache.hadoop.fs.Path

import org.apache.spark.sql.hive.orc._

import org.apache.spark.sql._

import org.apache.spark.h2o._

import water._

import org.apache.spark.sql.expressions.Window

import org.apache.spark.sql.DataFrame

import org.apache.spark.sql.expressions.WindowSpec

import org.apache.spark.sql.Column

import org.apache.spark.sql.Row

16/03/16 16:47:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/03/16 16:48:05 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@cdc38be

import sqlContext.implicits._

16/03/16 16:48:07 WARN H2OContext: Increasing 'spark.locality.wait' to value 30000

java.lang.RuntimeException: Cloud size under 200

at water.H2O.waitForCloudSize(H2O.java:1374)

at org.apache.spark.h2o.H2OContext.start(H2OContext.scala:154)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)

at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.

Michal Malohlava

unread,

Mar 16, 2016, 7:22:38 PM3/16/16

to Jonghoon Joey Ahnn, H2O Open Source Scalable Machine Learning - h2ostream

Hi Joey,

can you try the latest sparkling water 1.5.12 (or 1.6.1 for Spark 1.6)

Michal

mic...@0xdata.com

unread,

Mar 24, 2016, 1:33:19 PM3/24/16

to H2O Open Source Scalable Machine Learning - h2ostream, jha...@gmail.com, mic...@h2oai.com

Ho Joey,

any progress?

Thank you!

Michal

...

Jonghoon Joey Ahnn

unread,

Mar 28, 2016, 1:04:42 PM3/28/16

to mic...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream, mic...@h2oai.com

Hi Michal,

Our team upgrade HDP 2.2.4 to HDP 2.3.4, and am having a serious h2o context invocation issue.

I've used sparkling-water 1.5.3 on HDP 2.2.4 w/o problem earlier.

After the upgrade, I tested different versions of h2o (sparkling-water) on Spark 1.5 and 1.6 on HDP 2.3.4. The attached is logs.

As you might be aware, there was success and failure logs on the same setting. Each setting I ‘ve tested about 10 runs.

h2o sparkling-water 1.5.3 with Spark 1.5: about 2/3 runs failed
h2o sparkling-water 1.5.3 with Spark 1.5.0-rc3 : about 90% of runs failed
h2o sparkling-water 1.5.12 with Spark 1.5 : about 2/3 of runs failed
h2o sparkling-water 1.5.12 with Spark 1.5.0-rc3 : no more than 3 consecutive runs were successful
h2o sparkling-water 1.6.1 with Spark 1.6.1-rc1 : no successful runs were observed

Any issue reported as I had on your side?

Thanks.

-Joey

h2o1.5.3_spark1.5_fail.log

h2o1.5.3_spark1.5_suc.log

h2o1.5.3_spark1.5.0-rc3_fail.log

h2o1.5.3_spark1.5.0-rc3_succ.log

h2o1.5.12_spark1.5_fail.log

h2o1.5.12_spark1.5_succ.log

h2o1.5.12_spark1.5.0-rc3_fail.log

h2o1.5.12_spark1.5.0-rc3_succ.log

h2o1.6.1_spark1.6.1-rc1_fail.log

Michal Malohlava

unread,

Mar 28, 2016, 1:55:43 PM3/28/16

to Jonghoon Joey Ahnn, mic...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream

Hi Joey,

thanks for all information! It is really useful!

Regarding 1.5.12 - did you try to pass `--conf spark.scheduler.minRegisteredResourcesRatio=1` to force Spark to wait for all resources?

Regarding 1.6 - we saw similar behavior in our HDP infrastructure, however after fixing H2ORDD locality the problems disappeared (tested on HDP2.2 and 2.4).
It would be really helpful if you would be able to share Yarn logs.

Thank you for your feedback and sorry for inconvenience,
Michal

Did you try

Jonghoon Joey Ahnn

unread,

Mar 28, 2016, 3:40:25 PM3/28/16

to mic...@h2oai.com, mic...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream

Hi Michal,

I further tested sparkling-water 1.5.12 with various Spark versions with the " --conf spark.scheduler.minRegisteredResourcesRatio=1". The logs are attached. In summary, nothing worked out for me, probably it may work for the other runs on the same settings as I experiences earlier.

For sparkling-water 1.6.1, I got both success and failure logs attached. For the failure, I also attaching the yarn log (h2o1.6.1_spark1.6.1_fail_minsrcratio1.yarn.log)

Do you think that this is the issue only with HDP 2.3 ? Any reason that you skipped testing on HDP 2.3?

I hope it helps to dig into the issue.

Thanks.

-Joey

h2o1.5.12_spark1.5.0_fail_minsrcratio1.log

h2o1.5.12_spark1.5.0-rc3_fail_minsrcratio1.log

h2o1.5.12_spark1.5.1_fail_minsrcratio1.log

h2o1.5.12_spark1.6.0_fail_minsrcratio1.log

h2o1.5.12_spark1.6.1_fail_minsrcratio1.log

h2o1.6.1_spark1.6.1_fail_minsrcratio1.log

h2o1.6.1_spark1.6.1_fail_minsrcratio1.yarn.log

h2o1.6.1_spark1.6.1_suc_minsrcratio1.log

Aswin Jose Roy

unread,

Apr 11, 2016, 2:20:46 AM4/11/16

to H2O Open Source Scalable Machine Learning - h2ostream

I am getting the same issue on Spark standalone 1.6.0. The command I use to submit is "spark-submit --master master-url --num-executors 4 --driver-memory 1g --executor-memory 2g --executor-cores 1 --conf spark.scheduler.minRegisteredResourcesRatio=1 --packages ai.h2o:sparkling-water-core_2.10:1.6.1 --supervise --class ..

I am getting the cloud size under 30 exception. With a lot of executor getting killed logs. What are the possible reasons?

Tom Kraljevic

unread,

Apr 11, 2016, 2:52:56 AM4/11/16

to Aswin Jose Roy, H2O Open Source Scalable Machine Learning - h2ostream

very likely oom.

1g and 2g are very small. hard to do almost anything.

try 5g to start.

Sent from my iPhone

Aswin Jose Roy

unread,

Apr 11, 2016, 2:57:43 AM4/11/16

to H2O Open Source Scalable Machine Learning - h2ostream

Actually, Ignore the executor specifications. I've tried all kinds of spec.

On Wednesday, March 16, 2016 at 12:54:39 PM UTC+5:30, Jonghoon Joey Ahnn wrote:

omarsa...@gmail.com

unread,

Jun 5, 2018, 9:15:29 AM6/5/18

to H2O Open Source Scalable Machine Learning - h2ostream

Hi Aswin,

Did you solve the issue ? If yes, Can you explain. I am stuck in the same loop.

Regards
Omar

Reply all

Reply to author

Forward