Using pysparkling through an IDE

363 views
Skip to first unread message

ankit....@gmail.com

unread,
Dec 3, 2015, 4:28:59 PM12/3/15
to H2O Open Source Scalable Machine Learning - h2ostream
I have pysparkling installed and running both through notebook and shell. However, I am not sure how to configure my IDE, i.e PyCharm to use the H2OContext. I am able to use pycharm with pyspark and had to configure some paths for it.

It might be some path variable that I might be missing but my code fails on

hc= H2OContext(sc).start()


Script:

import os
import sys
from pysparkling import *
spark_home= os.environ['SPARK_HOME']

# Now we are ready to import Spark Modules
try:
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.mllib.fpm import FPGrowth
import h2o
print ("Successfully imported all Spark shit")
except ImportError as e:
print ("Error importing Spark Modules", e)
sys.exit(1)

sc = SparkContext("local", "Simple App")
print sc
hc= H2OContext(sc).start()
print hc

Michal Malohlava

unread,
Dec 3, 2015, 6:49:33 PM12/3/15
to h2os...@googlegroups.com, ja...@h2o.ai
Greetings!

Thank you for trying PySparkling!

Honestly to say, right now the whole development of pySparkling was done in ViM.
But for IDE you have to make sure, that IDE can access pyspark, pysparkling and also h2o package.
Right now i append H2O package on PYTHONPATH (using it directly from H2O repository): `export
PYTHONPATH=$PYTHONPATH:$H2O_HOME/h2o-py`

Feel free to comment on this strategy and propose any improvements! Any comments are welcomed!
I am also CCying Kuba, if he is able to provide more information.

In the meantime, can you please provide error (stacktrace) for code failure?

Thank you!
Michal

Ankit Arya

unread,
Dec 3, 2015, 7:47:29 PM12/3/15
to mic...@h2oai.com, h2os...@googlegroups.com, ja...@h2o.ai
Hi Michal,

I am using the python Anaconda Distribution. I am able to run spark jobs , so pyspark is ok. For h20, I can successfully do h2o.init() to setup a connection so that seems ok too. So it pretty much comes down to pysparkling. 

as you can see in the logs its not able to find the H2OContext class.

The whole log:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/03 16:22:34 INFO SparkContext: Running Spark version 1.5.2
15/12/03 16:22:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/03 16:22:34 INFO SecurityManager: Changing view acls to: aarya
15/12/03 16:22:34 INFO SecurityManager: Changing modify acls to: aarya
15/12/03 16:22:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aarya); users with modify permissions: Set(aarya)
15/12/03 16:22:35 INFO Slf4jLogger: Slf4jLogger started
15/12/03 16:22:35 INFO Remoting: Starting remoting
15/12/03 16:22:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark...@10.10.0.60:64641]
15/12/03 16:22:35 INFO Utils: Successfully started service 'sparkDriver' on port 64641.
15/12/03 16:22:35 INFO SparkEnv: Registering MapOutputTracker
15/12/03 16:22:35 INFO SparkEnv: Registering BlockManagerMaster
15/12/03 16:22:35 INFO DiskBlockManager: Created local directory at /private/var/folders/x3/sl62gc8n74l_cl299s2yxxyc0000gp/T/blockmgr-750eaf25-f633-428e-87be-db309e1efc2b
15/12/03 16:22:35 INFO MemoryStore: MemoryStore started with capacity 530.0 MB
15/12/03 16:22:35 INFO HttpFileServer: HTTP File server directory is /private/var/folders/x3/sl62gc8n74l_cl299s2yxxyc0000gp/T/spark-fe26f95f-ac1f-44b5-b400-406a26e02014/httpd-b8635885-5656-4890-8e5f-f0f3c0916044
15/12/03 16:22:35 INFO HttpServer: Starting HTTP Server
15/12/03 16:22:35 INFO Utils: Successfully started service 'HTTP file server' on port 64642.
15/12/03 16:22:35 INFO SparkEnv: Registering OutputCommitCoordinator
15/12/03 16:22:35 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
15/12/03 16:22:35 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
15/12/03 16:22:35 INFO Utils: Successfully started service 'SparkUI' on port 4042.
15/12/03 16:22:35 INFO SparkUI: Started SparkUI at http://10.10.0.60:4042
15/12/03 16:22:35 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
15/12/03 16:22:35 INFO Executor: Starting executor ID driver on host localhost
15/12/03 16:22:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 64643.
15/12/03 16:22:35 INFO NettyBlockTransferService: Server created on 64643
15/12/03 16:22:35 INFO BlockManagerMaster: Trying to register BlockManager
15/12/03 16:22:35 INFO BlockManagerMasterEndpoint: Registering block manager localhost:64643 with 530.0 MB RAM, BlockManagerId(driver, localhost, 64643)
15/12/03 16:22:35 INFO BlockManagerMaster: Registered BlockManager
Traceback (most recent call last):
  File "/Users/aarya/PycharmProjects/Decision-Lists/Pyspark_setup.py", line 20, in <module>
    hc= H2OContext(sc).start()
  File "build/bdist.linux-x86_64/egg/pysparkling/context.py", line 72, in __init__
  File "build/bdist.linux-x86_64/egg/pysparkling/context.py", line 96, in _do_init
  File "/Users/aarya/software/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/Users/aarya/software/spark-1.5.2-bin-hadoop2.6/python/pyspark/sql/utils.py", line 36, in deco
    return f(*a, **kw)
  File "/Users/aarya/software/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o21.loadClass.
: java.lang.ClassNotFoundException: org.apache.spark.h2o.H2OContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/Users/aarya/anaconda/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/Users/aarya/anaconda/lib/python2.7/site-packages/h2o/connection.py", line 565, in end_session
    H2OConnection.delete(url_suffix="InitID")
  File "/Users/aarya/anaconda/lib/python2.7/site-packages/h2o/connection.py", line 392, in delete
    raise ValueError("No h2o connection. Did you run `h2o.init()` ?")
ValueError: No h2o connection. Did you run `h2o.init()` ?
Error in sys.exitfunc:
Traceback (most recent call last):
  File "/Users/aarya/anaconda/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/Users/aarya/anaconda/lib/python2.7/site-packages/h2o/connection.py", line 565, in end_session
    H2OConnection.delete(url_suffix="InitID")
  File "/Users/aarya/anaconda/lib/python2.7/site-packages/h2o/connection.py", line 392, in delete
    raise ValueError("No h2o connection. Did you run `h2o.init()` ?")
ValueError: No h2o connection. Did you run `h2o.init()` ?
15/12/03 16:22:36 INFO SparkContext: Invoking stop() from shutdown hook

Thanks,
Ankit 

--
You received this message because you are subscribed to a topic in the Google Groups "H2O Open Source Scalable Machine Learning  - h2ostream" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h2ostream/HMVDVV2ddfo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Ankit Arya,
Data Scientist,
Collective[i].

Michal Malohlava

unread,
Dec 4, 2015, 2:08:17 PM12/4/15
to Ankit Arya, h2os...@googlegroups.com, ja...@h2o.ai
Hi Ankit,

you are missing sparkling-water library (in downloaded distribution of SparklingWater you can find it in assembly/build/libs).
You need to inform Spark about its location via --jars option of Spark-shell or spark-submit.
or you can use directly Spark package option as well:
--packages ai.h2o:sparkling-water-core_2.10:1.5.6,ai.h2o:sparkling-water-examples_2.10:1.5.6

Would it work for you?
Michal

Ankit Arya

unread,
Dec 4, 2015, 3:10:21 PM12/4/15
to h2os...@googlegroups.com, ja...@h2o.ai

---------- Forwarded message ----------
From: Ankit Arya <ankit....@gmail.com>
Date: Fri, Dec 4, 2015 at 12:09 PM
Subject: Re: [h2ostream] Using pysparkling through an IDE
To: mic...@h2oai.com


Hi Michal,

Thanks a ton ! got it it work. I had to modify the spark-defaults.conf file and the sparkling water jar permanently there.

I write up a blog post or something to make it easy for anyone else looking to do the same thing. (will share the link).

Cheers,
Ankit Arya

mika dup

unread,
Aug 24, 2016, 6:20:03 AM8/24/16
to H2O Open Source Scalable Machine Learning - h2ostream, ja...@h2o.ai, ankit....@gmail.com
Hi,

Can you explain the modifications you write in your spark-defauls.conf file.

I have the same problem and I write this :

spark.jars C:/Users/a013528/sparkling-water-1.6.5/assembly/build/libs/sparkling-water-assembly-1.6.5-all.jar

but it doesn't work

Thank you for your help

Jakub Hava

unread,
Aug 31, 2016, 8:05:48 AM8/31/16
to mika dup, H2O Open Source Scalable Machine Learning - h2ostream, Jakub Hava, ankit....@gmail.com
Hi Mika,
you shouldn’t need to edit spark_defaults.conf at all.

You need to either set  --jars option of Spark-shell or spark-submit with anargument pointing to jar of sparkling-water assembly
or you can use directly Spark package option as well:
--packages ai.h2o:sparkling-water-core_2.10:1.5.6,ai.h2o:sparkling-water-examples_2.10:1.5.6

./spark-submit ….. -jars path/to/sparkling-water.jar
or 
./spark-submit …-packages --packages ai.h2o:sparkling-water-core_2.10:1.5.6,ai.h2o:sparkling-water-examples_2.10:1.5.6

Kuba
Reply all
Reply to author
Forward
0 new messages