Py4JJavaError: An error occurred while calling o23.load

7,018 views
Skip to first unread message

Joaquin Henriquez

unread,
Jun 23, 2016, 8:33:39 AM6/23/16
to DataStax Spark Connector for Apache Cassandra
Hi Guys

Need advice on what is going wrong with the spark-cassandra connector:

hadoop@testbedocg:/mnt> pyspark --jars /usr/local/src/spark-cassandra-connector_2.10-1.6.0.jar
Python 2.7.11 (default, May 26 2016, 19:51:40)
[GCC 4.3.4 [gcc-4_3-branch revision 152973]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
16/06/23 12:20:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.1
/_/

Using Python version 2.7.11 (default, May 26 2016 19:51:40)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sqlContext.read\
... .format("org.apache.spark.sql.cassandra")\
... .options(table="nl_lebara_diameter_codes", keyspace="lebara_diameter_codes")\
... .load().show()
16/06/23 12:21:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:25 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/23 12:21:26 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/06/23 12:21:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/23 12:21:33 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/mnt/spark/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/mnt/spark/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.
: java.lang.NoClassDefFoundError: com/datastax/driver/core/ConsistencyLevel
at com.datastax.spark.connector.rdd.ReadConf$.<init>(ReadConf.scala:46)
at com.datastax.spark.connector.rdd.ReadConf$.<clinit>(ReadConf.scala)
at org.apache.spark.sql.cassandra.DefaultSource$.<init>(DefaultSource.scala:131)
at org.apache.spark.sql.cassandra.DefaultSource$.<clinit>(DefaultSource.scala)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:54)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.datastax.driver.core.ConsistencyLevel
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more

>>>

Jaroslaw Grabowski

unread,
Jun 23, 2016, 9:20:21 AM6/23/16
to spark-conn...@lists.datastax.com
Hello,

please see the preferred way of launching pyspark with Connector (via --packages) here:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/15_python.md

Sticking with jars would require building a huge classpath.


--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.



--

JAROSLAW GRABOWSKI
Software Engineer





Joaquin Henriquez

unread,
Jun 23, 2016, 12:22:23 PM6/23/16
to DataStax Spark Connector for Apache Cassandra
The machines I am working in do not have Internet access so they can not get the packages needed.

Is there anyway this can be install offline?

Joaquin Henriquez

unread,
Jun 23, 2016, 12:34:41 PM6/23/16
to DataStax Spark Connector for Apache Cassandra
El jueves, 23 de junio de 2016, 17:22:23 (UTC+1), Joaquin Henriquez escribió:
> The machines I am working in do not have Internet access so they can not get the packages needed.
>
> Is there anyway this can be install offline?

Doing it on a VM on my computer:
./bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0
with It access:
>>> sqlContext.read.format("org.apache.spark.sql.cassandra")
<pyspark.sql.readwriter.DataFrameReader object at 0x7f18b0ec8910>

Seems no problem. Now wondering how can it be manage offline.
Message has been deleted

Joaquin Henriquez

unread,
Jun 26, 2016, 10:28:00 PM6/26/16
to DataStax Spark Connector for Apache Cassandra
offline:
1) Install Squid Proxy on your computer
2) export http_proxy://<ip>:3128
export https_proxy://<ip>:3128
3)git clone <master_url>
4)sbt/sbt assembly

Joaquin Henriquez

unread,
Jun 27, 2016, 11:02:02 AM6/27/16
to DataStax Spark Connector for Apache Cassandra
Was figthing with this... now ok:

Yes was fighting with this issue for the last couple of days and just make it work.

>>> sqlContext.read.format("org.apache.spark.sql.cassandra")\
... .options(table="nl_lebara_diameter_codes", keyspace="lebara_diameter_codes").load().show()
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|service| ocg| date|errorcode2001|errorcode4010|errorcode4012|errorcode4998|errorcode4999|errorcode5007|errorcode5012|errorcode5030|
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| gprs|see3|2016-06-14 10:50:...| 2625| 10| 0| 0| 37| 0| 655| null|
| gprs|see3|2016-06-14 10:55:...| 2823| 3| 0| 0| 46| 0| 738| null|
| gprs|see3|2016-06-14 12:10:...| 2816| 0| 0| 0| 49| 0| 745| null|
| gprs|see3|2016-06-14 12:30:...| 2734| 4| 0| 0| 56| 0| 666| null|
| gprs|see3|2016-06-14 12:40:...| 2762| 0| 0| 0| 52| 0| 703| null|
| gprs|see3|2016-06-14 12:45:...| 3045| 5| 0| 0| 42| 0| 709| null|
| gprs|see3|2016-06-14 12:50:...| 2749| 0| 0| 0| 35| 0| 692| null|
| gprs|see3|2016-06-14 13:25:...| 2938| 6| 0| 0| 46| 0| 723| null|
| gprs|see3|2016-06-14 20:40:...| 2272| 1| 0| 0| 73| 0| 517| null|
| gprs|see3|2016-06-14 20:45:...| 2421| 3| 0| 0| 76| 0| 515| null|
| gprs|see3|2016-06-14 20:50:...| 2431| 0| 0| 0| 70| 0| 496| null|
| gprs|see3|2016-06-14 21:10:...| 2434| 7| 0| 0| 61| 0| 497| null|
| gprs|see3|2016-06-14 23:00:...| 0| 0| 0| 0| 0| 0| 0| null|
| gprs|see3|2016-06-14 23:05:...| 1140| 1| 0| 0| 49| 0| 370| null|
| gprs|see3|2016-06-14 23:10:...| 0| 0| 0| 0| 0| 0| 1| null|
| gprs|see3|2016-06-14 23:15:...| 0| 0| 0| 0| 0| 0| 1| null|
| gprs|see3|2016-06-14 23:20:...| 0| 0| 0| 0| 0| 0| 0| null|
| gprs|see3|2016-06-14 23:50:...| 1408| 2| 0| 0| 35| 0| 322| null|
| gprs|see3|2016-06-14 23:55:...| 1354| 1| 0| 0| 28| 0| 341| null|
| gprs|see3|2016-06-15 00:00:...| 1331| 2| 0| 0| 32| 0| 337| null|
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
only showing top 20 rows

>>>

Russell Spitzer

unread,
Jun 27, 2016, 11:05:35 AM6/27/16
to DataStax Spark Connector for Apache Cassandra

Perhaps you should look at the API docs for show


Russell Spitzer

unread,
Jun 27, 2016, 11:08:31 AM6/27/16
to DataStax Spark Connector for Apache Cassandra

def show(truncate: Boolean): Unit

Displays the top 20 rows of DataFrame in a tabular form.

--

Russell Spitzer
Software Engineer



Joaquin Henriquez

unread,
Jun 30, 2016, 1:22:56 PM6/30/16
to DataStax Spark Connector for Apache Cassandra
Hi Russell

Thanks, I will start looking at the API as the config is fine now.

Next round fight with the API :)

Reply all
Reply to author
Forward
0 new messages