I am using the python Anaconda Distribution. I am able to run spark jobs , so pyspark is ok. For h20, I can successfully do h2o.init() to setup a connection so that seems ok too. So it pretty much comes down to pysparkling.
as you can see in the logs its not able to find the H2OContext class.
The whole log:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/03 16:22:34 INFO SparkContext: Running Spark version 1.5.2
15/12/03 16:22:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/03 16:22:34 INFO SecurityManager: Changing view acls to: aarya
15/12/03 16:22:34 INFO SecurityManager: Changing modify acls to: aarya
15/12/03 16:22:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(aarya); users with modify permissions: Set(aarya)
15/12/03 16:22:35 INFO Slf4jLogger: Slf4jLogger started
15/12/03 16:22:35 INFO Remoting: Starting remoting
15/12/03 16:22:35 INFO Utils: Successfully started service 'sparkDriver' on port 64641.
15/12/03 16:22:35 INFO SparkEnv: Registering MapOutputTracker
15/12/03 16:22:35 INFO SparkEnv: Registering BlockManagerMaster
15/12/03 16:22:35 INFO DiskBlockManager: Created local directory at /private/var/folders/x3/sl62gc8n74l_cl299s2yxxyc0000gp/T/blockmgr-750eaf25-f633-428e-87be-db309e1efc2b
15/12/03 16:22:35 INFO MemoryStore: MemoryStore started with capacity 530.0 MB
15/12/03 16:22:35 INFO HttpFileServer: HTTP File server directory is /private/var/folders/x3/sl62gc8n74l_cl299s2yxxyc0000gp/T/spark-fe26f95f-ac1f-44b5-b400-406a26e02014/httpd-b8635885-5656-4890-8e5f-f0f3c0916044
15/12/03 16:22:35 INFO HttpServer: Starting HTTP Server
15/12/03 16:22:35 INFO Utils: Successfully started service 'HTTP file server' on port 64642.
15/12/03 16:22:35 INFO SparkEnv: Registering OutputCommitCoordinator
15/12/03 16:22:35 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
15/12/03 16:22:35 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
15/12/03 16:22:35 INFO Utils: Successfully started service 'SparkUI' on port 4042.
15/12/03 16:22:35 WARN MetricsSystem: Using default name DAGScheduler for source because
spark.app.id is not set.
15/12/03 16:22:35 INFO Executor: Starting executor ID driver on host localhost
15/12/03 16:22:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 64643.
15/12/03 16:22:35 INFO NettyBlockTransferService: Server created on 64643
15/12/03 16:22:35 INFO BlockManagerMaster: Trying to register BlockManager
15/12/03 16:22:35 INFO BlockManagerMasterEndpoint: Registering block manager localhost:64643 with 530.0 MB RAM, BlockManagerId(driver, localhost, 64643)
15/12/03 16:22:35 INFO BlockManagerMaster: Registered BlockManager
Traceback (most recent call last):
File "/Users/aarya/PycharmProjects/Decision-Lists/Pyspark_setup.py", line 20, in <module>
hc= H2OContext(sc).start()
File "build/bdist.linux-x86_64/egg/pysparkling/context.py", line 72, in __init__
File "build/bdist.linux-x86_64/egg/pysparkling/context.py", line 96, in _do_init
File "/Users/aarya/software/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
File "/Users/aarya/software/spark-1.5.2-bin-hadoop2.6/python/pyspark/sql/utils.py", line 36, in deco
return f(*a, **kw)
File "/Users/aarya/software/spark-1.5.2-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o21.loadClass.
: java.lang.ClassNotFoundException: org.apache.spark.h2o.H2OContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Users/aarya/anaconda/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/Users/aarya/anaconda/lib/python2.7/site-packages/h2o/connection.py", line 565, in end_session
H2OConnection.delete(url_suffix="InitID")
File "/Users/aarya/anaconda/lib/python2.7/site-packages/h2o/connection.py", line 392, in delete
raise ValueError("No h2o connection. Did you run `h2o.init()` ?")
ValueError: No h2o connection. Did you run `h2o.init()` ?
Error in sys.exitfunc:
Traceback (most recent call last):
File "/Users/aarya/anaconda/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/Users/aarya/anaconda/lib/python2.7/site-packages/h2o/connection.py", line 565, in end_session
H2OConnection.delete(url_suffix="InitID")
File "/Users/aarya/anaconda/lib/python2.7/site-packages/h2o/connection.py", line 392, in delete
raise ValueError("No h2o connection. Did you run `h2o.init()` ?")
ValueError: No h2o connection. Did you run `h2o.init()` ?
15/12/03 16:22:36 INFO SparkContext: Invoking stop() from shutdown hook