Need advice on what is going wrong with the spark-cassandra connector:
hadoop@testbedocg:/mnt> pyspark --jars /usr/local/src/spark-cassandra-connector_2.10-1.6.0.jar
Python 2.7.11 (default, May 26 2016, 19:51:40)
[GCC 4.3.4 [gcc-4_3-branch revision 152973]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
16/06/23 12:20:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Python version 2.7.11 (default, May 26 2016 19:51:40)
SparkContext available as sc, HiveContext available as sqlContext.
>>> sqlContext.read\
... .format("org.apache.spark.sql.cassandra")\
... .options(table="nl_lebara_diameter_codes", keyspace="lebara_diameter_codes")\
... .load().show()
16/06/23 12:21:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:25 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/23 12:21:26 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/06/23 12:21:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:30 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/23 12:21:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/23 12:21:33 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/mnt/spark/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/mnt/spark/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.
: java.lang.NoClassDefFoundError: com/datastax/driver/core/ConsistencyLevel
at com.datastax.spark.connector.rdd.ReadConf$.<init>(ReadConf.scala:46)
at com.datastax.spark.connector.rdd.ReadConf$.<clinit>(ReadConf.scala)
at org.apache.spark.sql.cassandra.DefaultSource$.<init>(DefaultSource.scala:131)
at org.apache.spark.sql.cassandra.DefaultSource$.<clinit>(DefaultSource.scala)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:54)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.datastax.driver.core.ConsistencyLevel
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more
>>>
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
Is there anyway this can be install offline?
Yes was fighting with this issue for the last couple of days and just make it work.
>>> sqlContext.read.format("org.apache.spark.sql.cassandra")\
... .options(table="nl_lebara_diameter_codes", keyspace="lebara_diameter_codes").load().show()
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|service| ocg| date|errorcode2001|errorcode4010|errorcode4012|errorcode4998|errorcode4999|errorcode5007|errorcode5012|errorcode5030|
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| gprs|see3|2016-06-14 10:50:...| 2625| 10| 0| 0| 37| 0| 655| null|
| gprs|see3|2016-06-14 10:55:...| 2823| 3| 0| 0| 46| 0| 738| null|
| gprs|see3|2016-06-14 12:10:...| 2816| 0| 0| 0| 49| 0| 745| null|
| gprs|see3|2016-06-14 12:30:...| 2734| 4| 0| 0| 56| 0| 666| null|
| gprs|see3|2016-06-14 12:40:...| 2762| 0| 0| 0| 52| 0| 703| null|
| gprs|see3|2016-06-14 12:45:...| 3045| 5| 0| 0| 42| 0| 709| null|
| gprs|see3|2016-06-14 12:50:...| 2749| 0| 0| 0| 35| 0| 692| null|
| gprs|see3|2016-06-14 13:25:...| 2938| 6| 0| 0| 46| 0| 723| null|
| gprs|see3|2016-06-14 20:40:...| 2272| 1| 0| 0| 73| 0| 517| null|
| gprs|see3|2016-06-14 20:45:...| 2421| 3| 0| 0| 76| 0| 515| null|
| gprs|see3|2016-06-14 20:50:...| 2431| 0| 0| 0| 70| 0| 496| null|
| gprs|see3|2016-06-14 21:10:...| 2434| 7| 0| 0| 61| 0| 497| null|
| gprs|see3|2016-06-14 23:00:...| 0| 0| 0| 0| 0| 0| 0| null|
| gprs|see3|2016-06-14 23:05:...| 1140| 1| 0| 0| 49| 0| 370| null|
| gprs|see3|2016-06-14 23:10:...| 0| 0| 0| 0| 0| 0| 1| null|
| gprs|see3|2016-06-14 23:15:...| 0| 0| 0| 0| 0| 0| 1| null|
| gprs|see3|2016-06-14 23:20:...| 0| 0| 0| 0| 0| 0| 0| null|
| gprs|see3|2016-06-14 23:50:...| 1408| 2| 0| 0| 35| 0| 322| null|
| gprs|see3|2016-06-14 23:55:...| 1354| 1| 0| 0| 28| 0| 341| null|
| gprs|see3|2016-06-15 00:00:...| 1331| 2| 0| 0| 32| 0| 337| null|
+-------+----+--------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
only showing top 20 rows
>>>
Perhaps you should look at the API docs for show
def show(truncate: Boolean): Unit
Displays the top 20 rows of DataFrame in a tabular form.
Thanks, I will start looking at the API as the config is fine now.
Next round fight with the API :)