Hey all,
I’m running what should be a very straight-forward application of the Cassandra sql connector, and I’m getting an error. I'm relatively new to spark and scala, so I'm sure there's something I'm doing wrong, but it doesn't make much sense.
I'm submitting this to spark as follows:
spark-submit test-2.0.5-SNAPSHOT-jar-with-dependencies.jar
This is the error I'm getting. My jar is shaded, so I assume this shouldn’t happen? I’ve confirmed that org.apache.spark.sql.cassandra and org.apache.cassandra classes are in the jar.
Exception in thread "main" java.lang.RuntimeException: Failed to load class for data source: org.apache.spark.sql.cassandra
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:220)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:233)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at com.latticeengines.test.CassandraTest$.main(CassandraTest.scala:33)
at com.latticeengines.test.CassandraTest.main(CassandraTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/07/30 15:34:47 INFO spark.SparkContext: Invoking stop() from shutdown hook
Here’s the code I’m trying to run:
object CassandraTest {
def main(args: Array[String]) {
println("Hello, scala!")
var conf = new SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "kv", "keyspace" -> "test"))
.load()
val w = Window.orderBy("value").rowsBetween(-2, 0)
df.select(mean("value").over(w))
}
}
Here's my pom:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="
http://maven.apache.org/POM/4.0.0" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<artifactId>test</artifactId>
<packaging>jar</packaging>
<name>${component-name}</name>
<properties>
<component-name>le-sparkdb</component-name>
<hadoop.version>2.6.0.2.2.0.0-2041</hadoop.version>
<scala.version>2.10.4</scala.version>
<spark.version>1.4.1</spark.version>
<avro.version>1.7.7</avro.version>
<parquet.avro.version>1.4.3</parquet.avro.version>
<le.domain.version>2.0.5-SNAPSHOT</le.domain.version>
<le.common.version>2.0.5-SNAPSHOT</le.common.version>
<le.eai.version>2.0.5-SNAPSHOT</le.eai.version>
<spark.cassandra.version>1.2.4</spark.cassandra.version>
</properties>
<parent>
<groupId>com.latticeengines</groupId>
<artifactId>le-parent</artifactId>
<version>2.0.5-SNAPSHOT</version>
<relativePath>le-parent</relativePath>
</parent>
<build>
<plugins>
<!-- <plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>com.latticeengines.test.CassandraTest</mainClass>
</manifest>
</archive>
</configuration>
</plugin> -->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.latticeengines.test.CassandraTest</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<version>${maven.eclipse.version}</version>
<configuration>
<downloadSources>true</downloadSources>
<downloadJavadocs>true</downloadJavadocs>
<projectnatures>
<projectnature>org.scala-ide.sdt.core.scalanature</projectnature>
<projectnature>org.eclipse.jdt.core.javanature</projectnature>
</projectnatures>
<buildcommands>
<buildcommand>org.scala-ide.sdt.core.scalabuilder</buildcommand>
</buildcommands>
<classpathContainers>
<classpathContainer>org.scala-ide.sdt.launching.SCALA_CONTAINER</classpathContainer>
<classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
</classpathContainers>
<excludes>
<exclude>org.scala-lang:scala-library</exclude>
<exclude>org.scala-lang:scala-compiler</exclude>
</excludes>
<sourceIncludes>
<sourceInclude>**/*.scala</sourceInclude>
<sourceInclude>**/*.java</sourceInclude>
</sourceIncludes>
</configuration>
</plugin>
</plugins>
<sourceDirectory>src/main/scala</sourceDirectory>
</build>
<dependencies>
<dependency>
<groupId>com.twitter</groupId>
<artifactId>parquet-avro</artifactId>
<version>${parquet.avro.version}</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>${avro.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>com.latticeengines</groupId>
<artifactId>le-domain</artifactId>
<version>${le.domain.version}</version>
<exclusions>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>com.latticeengines</groupId>
<artifactId>le-common</artifactId>
<version>${le.common.version}</version>
<exclusions>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>${spark.cassandra.version}</version>
</dependency>
</dependencies>
</project>
Thanks so much in advance!