spark-submit with Sparkling Water

Max

unread,

Mar 25, 2016, 5:41:30 PM3/25/16

to H2O Open Source Scalable Machine Learning - h2ostream

Hi Guys,

I've created an uber jar including Sparkling Water dependancy: sparkling-water-core_2.10.
(1.6.1)

I'm getting the following error when running it with spark-submit:

java.lang.IllegalArgumentException: Executor without H2O instance discovered, killing the cloud!

$SPARK_HOME/bin/spark-submit --master yarn-client --class MyExecutorApp ~/artifacts/h2o/sparklingWaterTest.jar

I've installed Sparkling Water and I'm able to use Flow with my cluster (locally) and also to use spark-shell with Sparkling Water.
(Hadoop with Yarn, and Spark 1.6.1)

Not sure what I'm missing.

Many thanks,

Max

max.ka...@gmail.com

unread,

Mar 27, 2016, 8:43:22 PM3/27/16

to H2O Open Source Scalable Machine Learning - h2ostream, max.ka...@gmail.com

On Friday, March 25, 2016 at 9:41:30 PM UTC, Max wrote:
Hi Guys,

I've created an uber jar including Sparkling Water dependancy: sparkling-water-core_2.10.
(1.6.1)

I'm getting the following error when running it with spark-submit:

16/03/27 20:45:15 INFO SpreadRDDBuilder: Detected 1 spark executors for None H2O workers!
16/03/27 20:45:15 INFO H2OContext: Launching H2O on following 1 nodes: (0,127.0.0.1,-1,127.0.0.1)
16/03/27 20:45:15 INFO SparkContext: Starting job: collect at H2OContextUtils.scala:169
16/03/27 20:45:15 INFO DAGScheduler: Got job 7 (collect at H2OContextUtils.scala:169) with 1 output partitions
16/03/27 20:45:15 INFO DAGScheduler: Final stage: ResultStage 7 (collect at H2OContextUtils.scala:169)
16/03/27 20:45:15 INFO DAGScheduler: Parents of final stage: List()
16/03/27 20:45:15 INFO DAGScheduler: Missing parents: List()
16/03/27 20:45:15 INFO DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[15] at map at H2OContextUtils.scala:106), which has no missing parents
16/03/27 20:45:15 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 2.7 KB, free 26.5 KB)
16/03/27 20:45:15 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 1788.0 B, free 28.2 KB)
16/03/27 20:45:15 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on 127.0.0.1:50238 (size: 1788.0 B, free: 511.5 MB)
16/03/27 20:45:15 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1006
16/03/27 20:45:15 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 7 (MapPartitionsRDD[15] at map at H2OContextUtils.scala:106)
16/03/27 20:45:15 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
16/03/27 20:45:15 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 327, localhost, partition 0,ANY, 2138 bytes)
16/03/27 20:45:15 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on localhost:60623 (size: 1788.0 B, free: 511.5 MB)
16/03/27 20:45:15 ERROR TaskSchedulerImpl: Lost executor 0 on localhost: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/03/27 20:45:15 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 327, localhost): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
16/03/27 20:45:15 INFO AppClient$ClientEndpoint: Executor updated: app-20160327204446-0021/0 is now EXITED (Command exited with code 50)
16/03/27 20:45:15 INFO SparkDeploySchedulerBackend: Executor app-20160327204446-0021/0 removed: Command exited with code 50
16/03/27 20:45:15 INFO DAGScheduler: Executor lost: 0 (epoch 0)
16/03/27 20:45:15 INFO SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
16/03/27 20:45:15 INFO AppClient$ClientEndpoint: Executor added: app-20160327204446-0021/1 on worker-20160327181152-127.0.0.1-50288 (127.0.0.1:50288) with 2 cores
16/03/27 20:45:15 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160327204446-0021/1 on hostPort 127.0.0.1:50288 with 2 cores, 1024.0 MB RAM
16/03/27 20:45:15 INFO AppClient$ClientEndpoint: Executor updated: app-20160327204446-0021/1 is now RUNNING
16/03/27 20:45:15 INFO BlockManagerMasterEndpoint: Trying to remove executor 0 from BlockManagerMaster.
16/03/27 20:45:15 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(0, localhost, 60623)
16/03/27 20:45:15 INFO BlockManagerMaster: Removed 0 successfully in removeExecutor
16/03/27 20:45:19 INFO SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (localhost:33091) with ID 1
16/03/27 20:45:19 INFO TaskSetManager: Starting task 0.1 in stage 7.0 (TID 328, localhost, partition 0,ANY, 2138 bytes)
16/03/27 20:45:19 ERROR LiveListenerBus: Listener anon1 threw an exception

java.lang.IllegalArgumentException: Executor without H2O instance discovered, killing the cloud!

And in Spark Logs:

16/03/27 20:43:37 WARN master.Master: Got status update for unknown executor app-20160327204305-0020/4
16/03/27 20:43:37 INFO master.Master: localhost:49324 got disassociated, removing it.
16/03/27 20:43:37 INFO master.Master: 10.0.2.15:43192 got disassociated, removing it.
16/03/27 20:44:46 INFO master.Master: Registering app SparklingWaterTest
16/03/27 20:44:46 INFO master.Master: Registered app SparklingWaterTest with ID app-20160327204446-0021
16/03/27 20:44:46 INFO master.Master: Launching executor app-20160327204446-0021/0 on worker worker-20160327181152-127.0.0.1-50288
16/03/27 20:45:15 INFO master.Master: Removing executor app-20160327204446-0021/0 because it is EXITED
16/03/27 20:45:15 INFO master.Master: Launching executor app-20160327204446-0021/1 on worker worker-20160327181152-127.0.0.1-50288
16/03/27 20:45:21 INFO master.Master: Removing executor app-20160327204446-0021/1 because it is EXITED
16/03/27 20:45:21 INFO master.Master: Launching executor app-20160327204446-0021/2 on worker worker-20160327181152-127.0.0.1-50288
16/03/27 20:45:26 INFO master.Master: Removing executor app-20160327204446-0021/2 because it is EXITED
16/03/27 20:45:26 INFO master.Master: Launching executor app-20160327204446-0021/3 on worker worker-20160327181152-127.0.0.1-50288
16/03/27 20:45:32 INFO master.Master: Removing executor app-20160327204446-0021/3 because it is EXITED
16/03/27 20:45:32 INFO master.Master: Launching executor app-20160327204446-0021/4 on worker worker-20160327181152-127.0.0.1-50288
16/03/27 20:45:32 INFO master.Master: Received unregister request from application app-20160327204446-0021
16/03/27 20:45:32 INFO master.Master: Removing app app-20160327204446-0021
16/03/27 20:45:32 INFO spark.SecurityManager: Changing view acls to: max
16/03/27 20:45:32 INFO spark.SecurityManager: Changing modify acls to: max
16/03/27 20:45:32 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(max); users with modify permissions: Set(max)
16/03/27 20:45:32 INFO master.Master: localhost:49354 got disassociated, removing it.
16/03/27 20:45:32 INFO master.Master: 127.0.0.1:59815 got disassociated, removing it.
16/03/27 20:45:35 WARN master.Master: Got status update for unknown executor app-20160327204446-0021/4

Michal Malohlava

unread,

Mar 28, 2016, 2:06:53 PM3/28/16

to h2os...@googlegroups.com

Hi Max,

can you try to pass `--conf spark.scheduler.minRegisteredResourcesRatio=1` to your spark submit.

Right now, we need to see all resources which were asked for.

Thank you!
Michal

Michal Malohlava

unread,

Mar 28, 2016, 2:11:02 PM3/28/16

to h2os...@googlegroups.com

Explanation: We installed hook which shutdowns cluster in the case that a topology of Spark cluster
changes.
(since we cannot extend H2O cluster right now on the fly)

Please try to specify `--conf spark.scheduler.minRegisteredResourcesRatio=1` option.

michal

max.ka...@gmail.com

unread,

Mar 30, 2016, 3:35:46 PM3/30/16

to H2O Open Source Scalable Machine Learning - h2ostream, mic...@h2oai.com

Hi Michal,

Thank you for getting back to me. I've tried running spark-submit with --conf "spark.scheduler.minRegisteredResourcesRatio=1" option and still getting the same error.

Thanks again,

Max

max.ka...@gmail.com

unread,

Mar 30, 2016, 3:36:39 PM3/30/16

to H2O Open Source Scalable Machine Learning - h2ostream, mic...@h2oai.com

On Monday, March 28, 2016 at 7:11:02 PM UTC+1, Michal Malohlava wrote:

Hi Michal,

am...@dataculture.in

unread,

Apr 6, 2016, 9:22:07 AM4/6/16

to H2O Open Source Scalable Machine Learning - h2ostream, max.ka...@gmail.com

I ran into same problem while using Sparkling Water:

I'm getting the following error when running it with spark-submit:

java.lang.IllegalArgumentException: Executor without H2O instance discovered, killing the cloud!

at org.apache.spark.h2o.H2OContext$$anon$1.onExecutorAdded(H2OContext.scala:203)
at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:58)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1180)
at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)

Even though I specify --num-executors 5 --executor-memory 1g --executor-cores 1 during spark-submit, the problem persists.

Michal Malohlava

unread,

Apr 6, 2016, 12:44:56 PM4/6/16

to h2os...@googlegroups.com

Hi there,

A few questions:

Which version of spark/sparkling water did you use?
Which environment - yarn/standalone?
did you specify --conf spark.scheduler.minRegisteredResourcesRatio=1

Thx,
michal

On 4/6/16 6:22 AM, am...@dataculture.in wrote:

I ran into same problem while using Sparkling Water:

I'm getting the following error when running it with spark-submit:

java.lang.IllegalArgumentException: Executor without H2O instance discovered, killing the cloud!
	at org.apache.spark.h2o.H2OContext$anon$1.onExecutorAdded(H2OContext.scala:203)
	at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:58)
	at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
	at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
	at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
	at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
	at org.apache.spark.util.AsynchronousListenerBus$anon$1$anonfun$run$1$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
	at org.apache.spark.util.AsynchronousListenerBus$anon$1$anonfun$run$1$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
	at org.apache.spark.util.AsynchronousListenerBus$anon$1$anonfun$run$1$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
	at org.apache.spark.util.AsynchronousListenerBus$anon$1$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1180)
	at org.apache.spark.util.AsynchronousListenerBus$anon$1.run(AsynchronousListenerBus.scala:63)

Even though I specify  --num-executors 5 --executor-memory 1g --executor-cores 1 during spark-submit, the problem persists.

max.ka...@gmail.com

unread,

Apr 6, 2016, 12:57:10 PM4/6/16

to H2O Open Source Scalable Machine Learning - h2ostream, mic...@h2oai.com

Hi Michal,

Thanks for getting back to me.

I'm using Sparkling Water 1.6.1, Spark-1.6.1-bin-hadoop2.6, and Hadoop 2.6.2 with Yarn.
And I did specify: --conf "spark.scheduler.minRegisteredResourcesRatio=1"

I hope it will help, and thanks again for your help!

Max

Aman Raj

unread,

Apr 6, 2016, 1:58:18 PM4/6/16

to max.ka...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream, mic...@h2oai.com

Hi,

Thanks for your reply Max

Sparkling water version that I am using 1.6.1

Environment - Standalone

Yes I tried `--conf spark.scheduler.minRegisteredResourcesRatio=1`

but still I get the same error.

Thanks,

Aman

--
You received this message because you are subscribed to a topic in the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h2ostream/zdEZ4pw6OHg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward