spark-submit - org.apache.spark.sql.types.DecimalType error

308 views
Skip to first unread message

Rodrigo Faccioli

unread,
May 9, 2016, 1:47:59 PM5/9/16
to DataStax Spark Connector for Apache Cassandra
Hi,

I am newbie between Spark and Cassandra.

I have worked with Cassandra and PHP. It is working well.

Now, I would like to work with Spark in this Cassandra. I have tried to connect Spark to Cassandra. I used the line command: bin/spark-submit --jars ~/spark-cassandra-connector-assembly-1.6.0-M2.jar --driver-class-path ~/spark-cassandra-connector-assembly-1.6.0-M2.jar teste_spark_cassandra.py

The spark-cassandra-connector-assembly-1.6.0-M2.jar was obtained at [1].

I received an error message:
py4j.protocol.Py4JJavaError: An error occurred while calling o26.load.
: java.lang.NoSuchMethodError: org.apache.spark.sql.types.DecimalType.<init>(II)V
at org.apache.spark.sql.cassandra.DataTypeConverter$.<init>(DataTypeConverter.scala:29)
at org.apache.spark.sql.cassandra.DataTypeConverter$.<clinit>(DataTypeConverter.scala)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$schema$1$$anonfun$apply$1.apply(CassandraSourceRelation.scala:61)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$schema$1$$anonfun$apply$1.apply(CassandraSourceRelation.scala:61)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$schema$1.apply(CassandraSourceRelation.scala:61)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$schema$1.apply(CassandraSourceRelation.scala:61)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.cassandra.CassandraSourceRelation.schema(CassandraSourceRelation.scala:61)
at org.apache.spark.sql.sources.LogicalRelation.<init>(LogicalRelation.scala:30)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)

[1] https://www.codementor.io/data-science/tutorial/installing-cassandra-spark-linux-debian-ubuntu-14

Naresh Dulam

unread,
May 9, 2016, 1:59:46 PM5/9/16
to spark-conn...@lists.datastax.com
I am also facing these kind of issues because of no proper versions of driver jar , connector jar, spark and Cassandra version. I have gone through spark documentation version comparability metrics, but couldn't succeed. Any one else in group currently have working environment of Cassandra spark integration? 
Please please share the version details they were using.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Russell Spitzer

unread,
May 9, 2016, 2:07:15 PM5/9/16
to spark-conn...@lists.datastax.com
Try using the --packages launching method, http://spark-packages.org/package/datastax/spark-cassandra-connector

If that doesn't work please forward

1) Environment
2) Cassandra Schema
3) Spark script you are running
4) Command-line you are using to launch spark
5) Sample of the data which is causing the error
6) The complete error

Naresh Dulam

unread,
May 9, 2016, 2:09:11 PM5/9/16
to spark-conn...@lists.datastax.com
Thanks Russell,
I will post all the version details and error messge

Rodrigo Faccioli

unread,
May 9, 2016, 2:26:49 PM5/9/16
to spark-conn...@lists.datastax.com
Dear Russell,

Thanks your attention. I've tried your suggestion that, unfortunately, it didn't work.

Below, I give all information that you asked in previous message:

1) Environment
Ubuntu 15.10. Cassandra 3. Spark-1.4.1-bin-hadoop2.4 

2) Cassandra Schema

CREATE TABLE monitoramento(
operadora text, 
dt_emissao  timestamp,
cod_benef text, 
nom_benef text, 
cod_emp text, 
nome_emp text, 
cidade text, 
uf text, 
regiao text, 
guia text, 
grupo_indicador text,  
local_atend  text, 
valor_emitido float,
valor_padrao float, 
valor_pago float,
gr_emp text,
nucleo text, 
ano_mes int,
PRIMARY KEY(operadora, dt_emissao, guia )
  );

3) Spark script you are running

4) Command-line you are using to launch spark
spark-submit --jars ~/spark-cassandra-connector-1.6.0-M2-s_2.11.jar teste_spark_cassandra.py

5) Sample of the data which is causing the error

6) The complete error
File "/home/faccioli/Execute/cassandra/spark/teste_spark_cassandra.py", line 10, in <module>
    monitoramento_rdd = sql.read.format("org.apache.spark.sql.cassandra").options(table="user", keyspace="devsaofran").load()
  File "/home/faccioli/Programs/spark-1.4.1-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 112, in load
  File "/home/faccioli/Programs/spark-1.4.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/home/faccioli/Programs/spark-1.4.1-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.
: java.lang.NoClassDefFoundError: com/datastax/driver/core/ConsistencyLevel
at com.datastax.spark.connector.rdd.ReadConf$.<init>(ReadConf.scala:42)
at com.datastax.spark.connector.rdd.ReadConf$.<clinit>(ReadConf.scala)
at org.apache.spark.sql.cassandra.DefaultSource$.<init>(DefaultSource.scala:131)
at org.apache.spark.sql.cassandra.DefaultSource$.<clinit>(DefaultSource.scala)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:54)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:269)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.datastax.driver.core.ConsistencyLevel
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

Best regards,


--
Rodrigo Antonio Faccioli, Ph.D
Data Scientist  for Biochemistry and Biophysics
MI4U - http://mi4u.com.br/
University of Sao Paulo - http://www5.usp.br/

Russell Spitzer

unread,
May 9, 2016, 2:30:26 PM5/9/16
to spark-conn...@lists.datastax.com
The main issue is that you are using Spark 1.4.1 built against scala 2.10 with the Spark Cassandra Connector built against Spark 1.6.X and Scala 2.11.

If you want to use C* 3.0 i suggest upgrading your spark build to 1.6.1 and use the connector artifact built against scala_2.10

--packages datastax:spark-cassandra-connector:1.6.0-M2-s_2.10

Rodrigo Faccioli

unread,
May 9, 2016, 3:30:17 PM5/9/16
to spark-conn...@lists.datastax.com
Dear Russell,

Thanks your attention. Now, it is working Spark1.6.1 and Cassandra.

Best regards,

--
Rodrigo Antonio Faccioli, Ph.D
Data Scientist  for Biochemistry and Biophysics
MI4U - http://mi4u.com.br/
University of Sao Paulo - http://www5.usp.br/

Russell Spitzer

unread,
May 9, 2016, 3:37:31 PM5/9/16
to spark-conn...@lists.datastax.com
No problem, please update us with any other issues or concerns.

Naresh Dulam

unread,
May 9, 2016, 9:10:42 PM5/9/16
to spark-conn...@lists.datastax.com
What is the Cassandra driver version your using?

Russell Spitzer

unread,
May 9, 2016, 9:15:26 PM5/9/16
to spark-conn...@lists.datastax.com

There is no need to specify a Cassandra driver, it is a dependency of the connector

Naresh Dulam

unread,
May 9, 2016, 9:18:51 PM5/9/16
to spark-conn...@lists.datastax.com
Hi Russel,

You mean we don't require to copy Cassandra driver in spark installation lib folder, correct?
The dependency will be part of connector!?




Regards,
Naresh 

Russell Spitzer

unread,
May 9, 2016, 9:19:49 PM5/9/16
to spark-conn...@lists.datastax.com

Yes, if you use packages or sbt assembly it will be included automatically

Naresh Dulam

unread,
May 10, 2016, 11:03:30 PM5/10/16
to spark-conn...@lists.datastax.com
Hi All,

Not able to connect cassandra from spark shell. Below are the following details of my environment

spark version    1.6.1
scala version   scala-2.10.4
cassandra version    3.0.0
connector jar    spark-cassandra-connector-1.6.0-M2-s_2.10.jar

started spark-shell with the following command

spark-shell --jars spark-cassandra-connector-1.6.0-M2-s_2.10.jar


import com.datastax.spark.connector._ 
import com.datastax.driver.core._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

val conf = new SparkConf(true)
val sc = new SparkContext(conf)

sc.getConf.set("spark.cassandra.connection.host","localhost")
val sc = new SparkContext("local[2]","test",conf)
val personRDD = sc.cassandraTable("people","person")


Below is the error poping up!



warning: Class org.joda.convert.FromString not found - continuing with a stub.
java.lang.NoClassDefFoundError: com/datastax/driver/core/ProtocolOptions$Compression
at com.datastax.spark.connector.cql.CassandraConnectorConf$.<init>(CassandraConnectorConf.scala:157)
at com.datastax.spark.connector.cql.CassandraConnectorConf$.<clinit>(CassandraConnectorConf.scala)
at com.datastax.spark.connector.cql.CassandraConnector$.apply(CassandraConnector.scala:190)
at com.datastax.spark.connector.SparkContextFunctions.cassandraTable$default$3(SparkContextFunctions.scala:52)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:50)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:52)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:54)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:56)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:62)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:64)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:66)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:68)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:70)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:72)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:74)


Please help on this. This is a big blocker for us


Regards,
Naresh















Russell Spitzer

unread,
May 10, 2016, 11:09:20 PM5/10/16
to spark-conn...@lists.datastax.com
You need all the dependencies that come with the connector. You can't only use only the core jar. Which is why I suggest using packages which brings in all the dependencies as well. If you use --jars you need to use the fat jar which is created by sbt assembly. 

Packages example:

spark-shell --packages datastax:spark-cassandra-connector:1.6.0-M2-s_2.10
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/13_spark_shell.md#starting-the-spark-shell

Naresh Dulam

unread,
May 10, 2016, 11:31:00 PM5/10/16
to spark-conn...@lists.datastax.com
Thanks Russell,
Now i am able to integrate spark and cassandra!!!

Your my saviour!!!! i was struggling since couple of weeks.. and tried lot of version combination!!!



Reply all
Reply to author
Forward
0 new messages