sparklyr & rsparkling via livy

128 views
Skip to first unread message

Jclaude...@orange.fr

unread,
Aug 10, 2018, 5:55:08 AM8/10/18
to H2O Open Source Scalable Machine Learning - h2ostream
Hello

I m trying to use rsparkling thru livy api from my laptop to an HDP cluster.
But I don't know how to start the h2o internal backend via R or if it has to be done another way ?

library(sparklyr)
library(h2o)
library(dplyr)
library(httr)
options(rsparkling.sparklingwater.version = "2.1.0")
library(rsparkling)

set_config( config( ssl_verifypeer = 0L ) )
config = livy_config(...)
sc <- spark_connect(master = ..., method = "livy", config = config)
h2o_context(sc,FALSE)


Error: java.lang.ClassNotFoundException: org.apache.spark.h2o.H2OContext
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)


regards

laurend

unread,
Aug 10, 2018, 1:03:06 PM8/10/18
to H2O Open Source Scalable Machine Learning - h2ostream
Can you post the versions you have for spark and h2o to make sure they correspond with the version for your version of rsparkling you want to run?

you can find a table of what version you need here as well as additional helpful instructions: https://github.com/h2oai/sparkling-water/tree/master/r

Here is also a short discussion on sparkling water with livy just a reference: https://github.com/h2oai/sparkling-water/issues/449

Jclaude...@orange.fr

unread,
Aug 13, 2018, 10:34:09 AM8/13/18
to H2O Open Source Scalable Machine Learning - h2ostream
Thanks for pointing this out, I have adjusted my sparkling-water-assembly_2.11-2.1.33-all.jar with my spark version (2.1.1) using the jars parameters in livy_config

config = livy_config(username = "",password = "" ,jars="hdfs:/tmp/mylibs/sparkling-water-assembly_2.11-2.1.33-all.jar" )
sc <- spark_connect(master = "livy rest url", method = "livy", config = config,version="2.1.1")

I ve got one warning : In livy_connection_not_used_warn(version) :Livy connections do not support version parameter
but the livy session is up

h2o_context(sc, strict_version_check =FALSE)
Error in h2o.init(ip = ip, port = port, strict_version_check = strict_version_check, :
Cannot connect to H2O server. Please check that H2O is running at http://10.79.xx.xx:54321/

when I try to init the h2o backends it failed, the logs in the ressource manager are not really clear but I suspect an ip problem/conflict due to multiple on each node ? ... is there any way to force the ip / network mask ?




laurend

unread,
Aug 13, 2018, 7:18:34 PM8/13/18
to H2O Open Source Scalable Machine Learning - h2ostream
a few more follow-up questions:
Are you able to see H2O running in your browser if you go to the Flow URL http://10.79.xx.xx:54321/ ?

In terms of available launch options you can take a look here: http://docs.h2o.ai/sparkling-water/2.1/latest-stable/doc/configuration/configuration_properties.html

a parameter that could be of use is `spark.ext.h2o.client.network.mask`

so that we can be of more help, would it possible for you to post your logs and provide a few more details about the general architecture in which you are working.


Thanks!

Lauren
Reply all
Reply to author
Forward
0 new messages