Running sparkling-water application on Apache Spark with --master yarn-client : ISSUE

226 views

Skip to first unread message

Devesh Kandpal

unread,

Mar 6, 2016, 10:55:11 PM3/6/16

to H2O Open Source Scalable Machine Learning - h2ostream

Hi,

I'm running my sparkling-water application by doing a ./spark-submit --master yarn-client --class className.class pathToJar.jar

My hadoop pseudo distributed configuration has a resource manager , with a single namenode and a datanode.

I've tried running : ./spark-shell --master yarn-client and I get a scala prompt without any error.

I'm using all h2o related jars as dependencies in an uber jar. I get the following exception:

I'm using apache spark-1.6 (tried with spark-1.5 as well, did not work) with hadoop 2.6.3 adn sparkling water 1.5.10 dependencies.

Exception in thread "main" java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext

at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:104)

at org.apache.spark.SparkContext.getExecutorStorageStatus(SparkContext.scala:1546)

at org.apache.spark.h2o.H2OContext.numOfSparkExecutors(H2OContext.scala:150)

at org.apache.spark.h2o.H2OContext.createSpreadRDD(H2OContext.scala:255)

at org.apache.spark.h2o.H2OContext.start(H2OContext.scala:183)

at amo.AMOCognitive$.main(AMOCognitive.scala:63)

at amo.AMOCognitive.main(AMOCognitive.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The errror in my code that I get at line 63 is where I initialize H20Context. FInd below the code for the same :

val conf = new SparkConf()

.setAppName("AMO Cognitive Tool")

.set("spark.logConf", "false")

val sc = new SparkContext(conf)

import org.apache.spark.h2o._

implicit val sqlContext = SQLContext.getOrCreate(sc)

import sqlContext.implicits._

// Start H2O services

val h2oContext = H2OContext.getOrCreate(sc).start()

I have even tried changing the highlighted line to :

val h2oContext = new H2OContext(sc).start()

However, this did not help

Could someone please help me out with this?

Appreciate your time and help.

Regards,

Devesh.

Michal Malohlava

unread,

Mar 7, 2016, 1:05:26 PM3/7/16

to h2os...@googlegroups.com

Hi Devesh,

Sparkling Water libraries 1.5.10 are designed for Spark 1.5.X
We are preparing 1.6 release right now.

However, did you see the same error with Spark 1.5?

I would recommend to:

increase available memory in driver and executors (options spark.driver.memory resp., spark.yarn.am.memory andspark.executor.memory),
make cluster homogeneous - use the same value for driver and executor memory
increase PermGen size if you are running on top of Java7 (options spark.driver.extraJavaOptions resp.,spark.yarn.am.extraJavaOptions and spark.executor.extraJavaOptions)
in rare cases, it helps to increase spark.yarn.driver.memoryOverhead, spark.yarn.am.memoryOverhead, orspark.yarn.executor.memoryOverhead

For running Sparkling Water on top of Yarn:

make sure that Yarn provides stable containers, do not use preemptive Yarn scheduler
make sure that Spark application manager has enough memory and increase PermGen size
in case of a container failure Yarn should not restart container and application should gracefully terminate

Furthermore, we recommend to configure the following Spark properties to speedup and stabilize creation of H2O services on top of Spark cluster:

Property	Context	Value	Explanation
`spark.locality.wait`	all	`3000`	Number of seconds to wait for task launch on data-local node. We recommend to increase since we would like to make sure that H2O tasks are processed locally with data.
`spark.scheduler.minRegisteredResourcesRatio`	all	`1`	Make sure that Spark starts scheduling when it sees 100% of resources.
`spark.task.maxFailures`	all	`1`	Do not try to retry failed tasks.
`spark...extraJavaOptions`	all	`-XX:MaxPermSize=384m`	Increase PermGen size if you are running on Java7. Make sure to configure it on driver/executor/Yarn application manager.
`spark.yarn.....memoryOverhead`	yarn	increase	Increase memoryOverhead if it is necessary.
`spark.yarn.max.executor.failures`	yarn	`1`	Do not try restart executors after failure and directly fail computation.

Michal

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages