java.lang.RuntimeException: Cloud size under 50

282 views
Skip to first unread message

Jonghoon Joey Ahnn

unread,
Mar 16, 2016, 3:24:39 AM3/16/16
to H2O Open Source Scalable Machine Learning - h2ostream
I am running sparkling water 1.5.3 on HDP 2.2.4.Someitme (not always) I am getting the following error.
Any idea on the error related to "cloud size under 50" I got?

===================================================
bash-4.1$ ./sparkling-water-shell.sh

-----
  Spark master (MASTER)     : local-cluster[10,2,8192]
  Spark home   (SPARK_HOME) : /userapps/hadoop/spark-1.6.0
  H2O build version         : 3.2.0.9 (slater)
  Spark build version       : 1.5.0
----

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:16:59 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
Spark context available as sc.
16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:17:20 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:17:23 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
16/03/16 02:17:26 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/03/16 02:17:26 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:17:27 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
SQL context available as sqlContext.
Loading /home_dir/svdfe001/var/ts-features-wf/test.scala...
import com.typesafe.config._
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.functions._
import org.apache.spark.sql.DataFrameNaFunctions
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._
import org.apache.spark.h2o._
import water._
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.expressions.WindowSpec
import org.apache.spark.sql.Column
import org.apache.spark.sql.Row
16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:17:45 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:17:46 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
16/03/16 02:17:48 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:17:48 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.heapsize does not exist
16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist
16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/03/16 02:17:50 WARN HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@7b919e50
import sqlContext.implicits._
16/03/16 02:17:55 WARN H2OContext: Increasing 'spark.locality.wait' to value 30000
java.lang.RuntimeException: Cloud size under 50                                 
at water.H2O.waitForCloudSize(H2O.java:1374)
at org.apache.spark.h2o.H2OContext.start(H2OContext.scala:154)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:64)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:75)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:77)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:79)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:81)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:83)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:85)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:87)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:89)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:91)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:93)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:95)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:97)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:99)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:101)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:103)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:105)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:107)
at $iwC$$iwC$$iwC.<init>(<console>:109)
at $iwC$$iwC.<init>(<console>:111)
at $iwC.<init>(<console>:113)
at <init>(<console>:115)
at .<init>(<console>:119)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:680)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$mcV$sp$2.apply(SparkILoop.scala:677)
at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:104)
at scala.reflect.io.File.applyReader(File.scala:82)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkILoop.scala:677)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1$$anonfun$apply$mcV$sp$1.apply(SparkILoop.scala:677)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$savingReplayStack(SparkILoop.scala:162)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply$mcV$sp(SparkILoop.scala:676)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$interpretAllFrom$1.apply(SparkILoop.scala:676)
at org.apache.spark.repl.SparkILoop.savingReader(SparkILoop.scala:167)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$interpretAllFrom(SparkILoop.scala:675)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:740)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadCommand$1.apply(SparkILoop.scala:739)
at org.apache.spark.repl.SparkILoop.withFile(SparkILoop.scala:733)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loadCommand(SparkILoop.scala:739)
at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:344)
at org.apache.spark.repl.SparkILoop$$anonfun$standardCommands$7.apply(SparkILoop.scala:344)
at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:81)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:809)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadFiles$1.apply(SparkILoop.scala:910)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$loadFiles$1.apply(SparkILoop.scala:908)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loadFiles(SparkILoop.scala:908)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:995)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Michal Malohlava

unread,
Mar 16, 2016, 4:48:55 PM3/16/16
to h2os...@googlegroups.com
Hi Joey,

the problem is that H2O cannot figure out topology of Spark cluster.

You should:
  - use Sparkling Water version compatible with Spark 1.6 (see http://www.h2o.ai/download/sparkling-water/choose or http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/1/index.html) or Spark 1.5 if you need to use the older version of Sparkling Water

  - look into Sparkling Water tuning guide: https://github.com/h2oai/sparkling-water/blob/master/DEVEL.md#SparklingWaterTuning

  - i would recommend to pass the following configuration option to your job:
   `--conf spark.scheduler.minRegisteredResourcesRatio=1`. It forces Spark to wait till all resources are available what helps H2O with detecting cloud topology.

Please let us know if it helps!
We are working on redesigning clouding strategy, so any feedback is welcomed.

Thank you!
michal
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonghoon Joey Ahnn

unread,
Mar 16, 2016, 5:58:46 PM3/16/16
to H2O Open Source Scalable Machine Learning - h2ostream, mic...@h2oai.com
Hi Michal,

I applied the spark property you suggested and changed to the spark 1.5 with sparkling-water 1.5.3.
Unfortunately, I am getting the same cloud size error. It seems either spark.scheduler.maxRegisteredResourcesWaitingTime or spark.scheduler.minRegisteredResourcesRatio does' helps to resolve the issue.

My script is following,
------------------------------
bin/sparkling-shell -i /home_dir/svdfe001/var/ts-features-wf/tmp/test.scala \
 --master yarn \
 --conf spark.executor.memory=16g \
 --conf spark.driver.memory=16g \
 --conf spark.driver.maxResultSize=8g \
 --conf spark.executor.instances=200 \
 --conf spark.yarn.queue=analysis \
 --conf spark.scheduler.maxRegisteredResourcesWaitingTime=1000000 \
 --conf spark.scheduler.minRegisteredResourcesRatio=1
------------------------------

The error log looks like,
-----
  Spark master (MASTER)     : local-cluster[10,2,8192]
  Spark home   (SPARK_HOME) : /userapps/hadoop/spark-1.5.0-rc3
  H2O build version         : 3.2.0.9 (slater)
  Spark build version       : 1.5.0
----

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
16/03/16 16:46:49 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Spark context available as sc.
16/03/16 16:47:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/16 16:47:39 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
SQL context available as sqlContext.
Loading /home_dir/svdfe001/var/ts-features-wf/tmp/test.scala...
import com.typesafe.config._
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.functions._
import org.apache.spark.sql.DataFrameNaFunctions
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.spark.sql.hive.orc._
import org.apache.spark.sql._
import org.apache.spark.h2o._
import water._
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.expressions.WindowSpec
import org.apache.spark.sql.Column
import org.apache.spark.sql.Row
16/03/16 16:47:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/03/16 16:48:05 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@cdc38be
import sqlContext.implicits._
16/03/16 16:48:07 WARN H2OContext: Increasing 'spark.locality.wait' to value 30000
java.lang.RuntimeException: Cloud size under 200                                
at water.H2O.waitForCloudSize(H2O.java:1374)
at org.apache.spark.h2o.H2OContext.start(H2OContext.scala:154)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.

Michal Malohlava

unread,
Mar 16, 2016, 7:22:38 PM3/16/16
to Jonghoon Joey Ahnn, H2O Open Source Scalable Machine Learning - h2ostream
Hi Joey,

can you try the latest sparkling water 1.5.12 (or 1.6.1 for Spark 1.6)

Michal

mic...@0xdata.com

unread,
Mar 24, 2016, 1:33:19 PM3/24/16
to H2O Open Source Scalable Machine Learning - h2ostream, jha...@gmail.com, mic...@h2oai.com
Ho Joey,

any progress?

Thank you!
Michal
...

Jonghoon Joey Ahnn

unread,
Mar 28, 2016, 1:04:42 PM3/28/16
to mic...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream, mic...@h2oai.com
Hi Michal,

Our team upgrade HDP 2.2.4 to HDP 2.3.4, and am having a serious h2o context invocation issue.

I've used sparkling-water 1.5.3 on HDP 2.2.4 w/o problem earlier.
After the upgrade, I tested different versions of h2o (sparkling-water) on Spark 1.5 and 1.6 on HDP 2.3.4. The attached is logs.
As you might be aware, there was success and failure logs on the same setting. Each setting I ‘ve tested about 10 runs.
  • h2o sparkling-water 1.5.3 with Spark 1.5: about 2/3 runs failed
  • h2o sparkling-water 1.5.3 with Spark 1.5.0-rc3 : about 90% of runs failed
  • h2o sparkling-water 1.5.12 with Spark 1.5 : about 2/3 of runs failed
  • h2o sparkling-water 1.5.12 with Spark 1.5.0-rc3 : no more than 3 consecutive runs were successful
  • h2o sparkling-water 1.6.1 with Spark 1.6.1-rc1 : no successful runs were observed 
Any issue reported as I had on your side?
Thanks.
-Joey

h2o1.5.3_spark1.5_fail.log
h2o1.5.3_spark1.5_suc.log
h2o1.5.3_spark1.5.0-rc3_fail.log
h2o1.5.3_spark1.5.0-rc3_succ.log
h2o1.5.12_spark1.5_fail.log
h2o1.5.12_spark1.5_succ.log
h2o1.5.12_spark1.5.0-rc3_fail.log
h2o1.5.12_spark1.5.0-rc3_succ.log
h2o1.6.1_spark1.6.1-rc1_fail.log

Michal Malohlava

unread,
Mar 28, 2016, 1:55:43 PM3/28/16
to Jonghoon Joey Ahnn, mic...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream
Hi Joey,

thanks for all information! It is really useful!

Regarding 1.5.12 - did you try to pass `--conf spark.scheduler.minRegisteredResourcesRatio=1` to force Spark to wait for all resources?

Regarding 1.6 - we saw similar behavior in our HDP infrastructure, however after fixing H2ORDD locality the problems disappeared (tested on HDP2.2 and 2.4).
It would be really helpful if you would be able to share Yarn logs.

Thank you for your feedback and sorry for inconvenience,
Michal


Did you try

Jonghoon Joey Ahnn

unread,
Mar 28, 2016, 3:40:25 PM3/28/16
to mic...@h2oai.com, mic...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream
Hi Michal,

I further tested sparkling-water 1.5.12 with various Spark versions with the " --conf spark.scheduler.minRegisteredResourcesRatio=1". The logs are attached. In summary, nothing worked out for me, probably it may work for the other runs on the same settings as I experiences earlier.

For sparkling-water 1.6.1, I got both success and failure logs attached. For the failure, I also attaching the yarn log (h2o1.6.1_spark1.6.1_fail_minsrcratio1.yarn.log)

Do you think that this is the issue only with HDP 2.3 ? Any reason that you skipped testing on HDP 2.3?
I hope it helps to dig into the issue. 

Thanks.
-Joey

h2o1.5.12_spark1.5.0_fail_minsrcratio1.log
h2o1.5.12_spark1.5.0-rc3_fail_minsrcratio1.log
h2o1.5.12_spark1.5.1_fail_minsrcratio1.log
h2o1.5.12_spark1.6.0_fail_minsrcratio1.log
h2o1.5.12_spark1.6.1_fail_minsrcratio1.log
h2o1.6.1_spark1.6.1_fail_minsrcratio1.log
h2o1.6.1_spark1.6.1_fail_minsrcratio1.yarn.log
h2o1.6.1_spark1.6.1_suc_minsrcratio1.log

Aswin Jose Roy

unread,
Apr 11, 2016, 2:20:46 AM4/11/16
to H2O Open Source Scalable Machine Learning - h2ostream
I am getting the same issue on Spark standalone 1.6.0. The command I use to submit is "spark-submit --master master-url --num-executors 4 --driver-memory 1g --executor-memory 2g --executor-cores 1 --conf spark.scheduler.minRegisteredResourcesRatio=1 --packages ai.h2o:sparkling-water-core_2.10:1.6.1 --supervise --class ..

I am getting the cloud size under 30 exception. With a lot of executor getting killed logs. What are the possible reasons?

Tom Kraljevic

unread,
Apr 11, 2016, 2:52:56 AM4/11/16
to Aswin Jose Roy, H2O Open Source Scalable Machine Learning - h2ostream

very likely oom.
1g and 2g are very small.  hard to do almost anything.
try 5g to start.

Sent from my iPhone

Aswin Jose Roy

unread,
Apr 11, 2016, 2:57:43 AM4/11/16
to H2O Open Source Scalable Machine Learning - h2ostream
Actually, Ignore the executor specifications. I've tried all kinds of spec. 


On Wednesday, March 16, 2016 at 12:54:39 PM UTC+5:30, Jonghoon Joey Ahnn wrote:

omarsa...@gmail.com

unread,
Jun 5, 2018, 9:15:29 AM6/5/18
to H2O Open Source Scalable Machine Learning - h2ostream
Hi Aswin,

Did you solve the issue ? If yes, Can you explain. I am stuck in the same loop.

Regards
Omar
Reply all
Reply to author
Forward
0 new messages