Current status of Sparkling Water?

245 views
Skip to first unread message

Arun Luthra

unread,
Sep 26, 2014, 5:40:13 PM9/26/14
to h2os...@googlegroups.com
Yesterday I tried to get Sparkling Water to work, but had problems getting the master + 1 slave example working. (*)

Is it possible yet to let Yarn and Spark handle all of the namenode/datanode/executor management for H2O? Will H2O be written using the Spark API in the future (or is it already, somewhere)?

Arun

(*): 02:38:16.342 main      INFO WATER: Creating REMOTE (spark://sandbox.hortonworks.com:7077) Spark context.
org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up.

mic...@0xdata.com

unread,
Sep 26, 2014, 6:25:24 PM9/26/14
to h2os...@googlegroups.com
Hi Arun,

thanks for trying Sparkling Water!

May you please specify your use-case in more details? 
Did you use perrier version of Sparkling Water or Tachyon-based version?

We are now heavily working on improving Sparkling water and preparing new version called perrier.
In the version Spark executors manage H2O lifecycle, however, we still have not yet properly tried Yarn-based scenario.

Regarding Spark API - we have a layer which is going to be Spark compatible - the current state is here: https://github.com/0xdata/perrier, and here: https://github.com/0xdata/h2o-dev/tree/master/h2o-scala
Nevertheless, we are not going to rewrite our algorithms with Spark API.

Best regards,
michal

Arun Luthra

unread,
Sep 26, 2014, 8:23:33 PM9/26/14
to h2os...@googlegroups.com
Hi Michal,

I'd like to do supervised learning with an RDD input and two softmax outputs. Actually, the Spark aspect is appealing, but not required.

Ideally, I would not have to install stuff (H2O, node.js, Spark, etc.) on every node on the cluster. Yarn or something else would distribute my H2O task to all the workers on the cluster.

Arun Luthra

unread,
Sep 26, 2014, 8:26:40 PM9/26/14
to h2os...@googlegroups.com
I used this version (tachyon): https://github.com/0xdata/h2o-sparkling

--
You received this message because you are subscribed to a topic in the Google Groups "H2O & Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h2ostream/nWHZmQWTDfY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cli...@0xdata.com

unread,
Oct 1, 2014, 1:17:27 PM10/1/14
to h2os...@googlegroups.com
H2O stand-alone does play nicely with Hadoop .- see http://docs.0xdata.com/deployment/hadoop.html
It also works well with Yarn.

The newer version of Sparkling Water Michael is referring too is much simpler to deploy, needing only an extra Jar to pass alongside the normal Spark deployment.  At this point in time it needs a custom build of Spark, the perrier repro mentioned above, in order to call H2O's initialization from Spark at the appropriate points.  We think this situation will be short lived, but is needed for now.

Cliff

Michal Malohlava

unread,
Oct 28, 2014, 1:29:56 PM10/28/14
to h2os...@googlegroups.com
Here is the update regarding Sparkling water status:

 - Sparkling water is now integrated on application level. It means that you do not need any modification of Spark infrastructure, and you can use regular Spark 1.1.0+ distribution
 - The project source is located https://github.com/0xdata/sparkling-water and also contains pointers to download of the latest version 0.2.0
 - To run examples or Sparkling Shell (= regular Spark Shell with additional jar) you need to export SPARK_HOME and point it into you Spark distribution
 - There is a simple demo described in https://github.com/0xdata/sparkling-water/blob/master/examples/README.md showing use of H2O, Spark RDD API, Spark SQL

Enjoy and let us know any feedback!

Thank you for trying H2O!
michal

Ben Peters

unread,
Oct 31, 2014, 1:19:53 AM10/31/14
to h2os...@googlegroups.com
Hi michal,
I am trying to get sparkling water working using the newest code on github. When I run it locally, I can (almost) get through the example, but but it fails on training the model, apparently because the executors are running out of memory. I'm not super familiar with Spark's local execution mode, but env variables/--executor-memory options don't seem to let me increase the executor memory size, which is set to 512mb. I can increase the value in my MASTER configuration (e.g. local-cluster[3, 2, 2048]), but according to the log, this only increases the RAM of the workers, not the actual executors themselves.

When I run on a real cluster, the --executor-memory parameter works again as expected, however H2OContext.start fails. It runs through all the initialization on the cluster, and even displays the H2O startup message block (----- H2O started (client) ----- & info on ports/logging), but then it stalls on water.H2O.waitForCloudSize until the timeout hits, and then I get a "Cloud size under 12" exception. I bumped over to one of the workers and tailed the application stderr log:

lots of

java.lang.NullPointerException
at water.TypeMap.NewFreezable(TypeMap.java:176)
at water.AutoBuffer.get(AutoBuffer.java:665)
at water.RPC.remote_exec(RPC.java:45)
at water.FJPacket.compute2(FJPacket.java:23)
at water.H2O$H2OCountedCompleter.compute(H2O.java:200)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

any ideas?

thanks!
-ben

Michal Malohlava

unread,
Oct 31, 2014, 1:34:03 PM10/31/14
to h2os...@googlegroups.com
Hi Ben!

Thanks for trying Sparkling Water!

Regarding your questions:
Dne 10/30/14 v 10:19 PM Ben Peters napsal(a):
I am trying to get sparkling water working using the newest code on github. When I run it locally, I can (almost) get through the example, but but it fails on training the model, apparently because the executors are running out of memory. I'm not super familiar with Spark's local execution mode, but env variables/--executor-memory options don't seem to let me increase the executor memory size, which is set to 512mb. I can increase the value in my MASTER configuration (e.g. local-cluster[3, 2, 2048]), but according to the log, this only increases the RAM of the workers, not the actual executors themselves.
You need to pass spark.executor.memory parameter to Spark runtime.
For example, if you are running example from Sparkling Shell, you can use this command line:
bin/sparkling-shell --conf "spark.executor.memory=4g"

Or you can setup the memory size directly in ${SPARK_HOME}/conf/spark-defaults.conf by appending the following line:
spark.executor.memory=4g


When I run on a real cluster, the --executor-memory parameter works again as expected, however H2OContext.start fails. It runs through all the initialization on the cluster, and even displays the H2O startup message block (----- H2O started (client) ----- & info on ports/logging), but then it stalls on water.H2O.waitForCloudSize until the timeout hits, and then I get a "Cloud size under 12" exception. I bumped over to one of the workers and tailed the application stderr log:

lots of 

java.lang.NullPointerException
    at water.TypeMap.NewFreezable(TypeMap.java:176)
    at water.AutoBuffer.get(AutoBuffer.java:665)
    at water.RPC.remote_exec(RPC.java:45)
    at water.FJPacket.compute2(FJPacket.java:23)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:200)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

any ideas?
It seems like there is an inconsistency in exchanging type information.

This is really interesting! Can you send us logs from Spark executors/workers and from client? Plus more technical details
for your cloud - how many nodes, memory per executor, is multicast enabled on your network?

Thanks a lot!
michal
--
Join us at H2O World 2014 | November 18th and 19th | Computer History Museum, Mountain View 

powde...@gmail.com

unread,
Oct 31, 2014, 6:25:46 PM10/31/14
to h2os...@googlegroups.com
On Friday, October 31, 2014 11:34:03 AM UTC-6, Michal Malohlava wrote:

>
> Dne 10/30/14 v 10:19 PM Ben Peters napsal(a):

> You need to pass spark.executor.memory parameter to Spark runtime.
>
> For example, if you are running example from Sparkling Shell, you
> can use this command line:
>
> bin/sparkling-shell --conf "spark.executor.memory=4g"

Perfect, that looks good. I figured it was something simple like that, but I just didn't want to keep banging my head against the wall once --executor-memory wasn't working.


>
> This is really interesting! Can you send us logs from Spark
> executors/workers and from client? Plus more technical details
>
> for your cloud - how many nodes, memory per executor, is multicast
> enabled on your network?
>

Definitely. Can I just email you the log directly?

As for the cloud: 12 nodes, 30 GB of RAM per worker, but just going with the default (512 mb) memory per executor just to get an H2OContext going. I've tried changing the executor memory and it doesn't affect the errors I'm getting. I *believe* multicast is enabled on the network, but might have to confirm that with an IT guy.

Thanks!
Reply all
Reply to author
Forward
0 new messages