Py4JJavaError: An error occurred while calling o23.load

Joaquin Henriquez

unread,

Jun 23, 2016, 8:33:39 AM6/23/16

to DataStax Spark Connector for Apache Cassandra

Hi Guys

Need advice on what is going wrong with the spark-cassandra connector:

hadoop@testbedocg:/mnt> pyspark --jars /usr/local/src/spark-cassandra-connector_2.10-1.6.0.jar
Python 2.7.11 (default, May 26 2016, 19:51:40)
[GCC 4.3.4 [gcc-4_3-branch revision 152973]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
16/06/23 12:20:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.1
/_/

Using Python version 2.7.11 (default, May 26 2016 19:51:40)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sqlContext.read\
... .format("org.apache.spark.sql.cassandra")\
... .options(table="nl_lebara_diameter_codes", keyspace="lebara_diameter_codes")\
... .load().show()
16/06/23 12:21:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:25 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/23 12:21:26 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/06/23 12:21:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/23 12:21:33 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/mnt/spark/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/mnt/spark/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.
: java.lang.NoClassDefFoundError: com/datastax/driver/core/ConsistencyLevel
at com.datastax.spark.connector.rdd.ReadConf$.<init>(ReadConf.scala:46)
at com.datastax.spark.connector.rdd.ReadConf$.<clinit>(ReadConf.scala)
at org.apache.spark.sql.cassandra.DefaultSource$.<init>(DefaultSource.scala:131)
at org.apache.spark.sql.cassandra.DefaultSource$.<clinit>(DefaultSource.scala)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:54)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.datastax.driver.core.ConsistencyLevel
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more

>>>

Jaroslaw Grabowski

unread,

Jun 23, 2016, 9:20:21 AM6/23/16

to spark-conn...@lists.datastax.com

Hello,

please see the preferred way of launching pyspark with Connector (via --packages) here:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/15_python.md

Sticking with jars would require building a huge classpath.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

JAROSLAW GRABOWSKI
Software Engineer

Joaquin Henriquez

unread,

Jun 23, 2016, 12:22:23 PM6/23/16

to DataStax Spark Connector for Apache Cassandra

The machines I am working in do not have Internet access so they can not get the packages needed.

Is there anyway this can be install offline?

Joaquin Henriquez

unread,

Jun 23, 2016, 12:34:41 PM6/23/16

to DataStax Spark Connector for Apache Cassandra

El jueves, 23 de junio de 2016, 17:22:23 (UTC+1), Joaquin Henriquez escribió:
> The machines I am working in do not have Internet access so they can not get the packages needed.
>
> Is there anyway this can be install offline?

Doing it on a VM on my computer:
./bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0
with It access:
>>> sqlContext.read.format("org.apache.spark.sql.cassandra")
<pyspark.sql.readwriter.DataFrameReader object at 0x7f18b0ec8910>

Seems no problem. Now wondering how can it be manage offline.

Message has been deleted

Joaquin Henriquez

unread,

Jun 26, 2016, 10:28:00 PM6/26/16

to DataStax Spark Connector for Apache Cassandra

offline:
1) Install Squid Proxy on your computer
2) export http_proxy://<ip>:3128
export https_proxy://<ip>:3128
3)git clone <master_url>
4)sbt/sbt assembly

Joaquin Henriquez

unread,

Jun 27, 2016, 11:02:02 AM6/27/16

to DataStax Spark Connector for Apache Cassandra

Was figthing with this... now ok:

Yes was fighting with this issue for the last couple of days and just make it work.

>>> sqlContext.read.format("org.apache.spark.sql.cassandra")\
... .options(table="nl_lebara_diameter_codes", keyspace="lebara_diameter_codes").load().show()
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|service| ocg| date|errorcode2001|errorcode4010|errorcode4012|errorcode4998|errorcode4999|errorcode5007|errorcode5012|errorcode5030|
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| gprs|see3|2016-06-14 10:50:...| 2625| 10| 0| 0| 37| 0| 655| null|
| gprs|see3|2016-06-14 10:55:...| 2823| 3| 0| 0| 46| 0| 738| null|
| gprs|see3|2016-06-14 12:10:...| 2816| 0| 0| 0| 49| 0| 745| null|
| gprs|see3|2016-06-14 12:30:...| 2734| 4| 0| 0| 56| 0| 666| null|
| gprs|see3|2016-06-14 12:40:...| 2762| 0| 0| 0| 52| 0| 703| null|
| gprs|see3|2016-06-14 12:45:...| 3045| 5| 0| 0| 42| 0| 709| null|
| gprs|see3|2016-06-14 12:50:...| 2749| 0| 0| 0| 35| 0| 692| null|
| gprs|see3|2016-06-14 13:25:...| 2938| 6| 0| 0| 46| 0| 723| null|
| gprs|see3|2016-06-14 20:40:...| 2272| 1| 0| 0| 73| 0| 517| null|
| gprs|see3|2016-06-14 20:45:...| 2421| 3| 0| 0| 76| 0| 515| null|
| gprs|see3|2016-06-14 20:50:...| 2431| 0| 0| 0| 70| 0| 496| null|
| gprs|see3|2016-06-14 21:10:...| 2434| 7| 0| 0| 61| 0| 497| null|
| gprs|see3|2016-06-14 23:00:...| 0| 0| 0| 0| 0| 0| 0| null|
| gprs|see3|2016-06-14 23:05:...| 1140| 1| 0| 0| 49| 0| 370| null|
| gprs|see3|2016-06-14 23:10:...| 0| 0| 0| 0| 0| 0| 1| null|
| gprs|see3|2016-06-14 23:15:...| 0| 0| 0| 0| 0| 0| 1| null|
| gprs|see3|2016-06-14 23:20:...| 0| 0| 0| 0| 0| 0| 0| null|
| gprs|see3|2016-06-14 23:50:...| 1408| 2| 0| 0| 35| 0| 322| null|
| gprs|see3|2016-06-14 23:55:...| 1354| 1| 0| 0| 28| 0| 341| null|
| gprs|see3|2016-06-15 00:00:...| 1331| 2| 0| 0| 32| 0| 337| null|
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
only showing top 20 rows