Is there any performance benchmark available for spark Cassandra connector?

purnim...@iet.ahduni.edu.in

unread,

May 21, 2018, 2:25:46 AM5/21/18

to DataStax Spark Connector for Apache Cassandra

I use NYC taxi data set of approx. 30 million rows with 5 GB storage. I use sparklyr and crassy package to read/write data from/to spark cassandra. The experiments are performed on a cluster of 4 nodes (8 cores, 16GB RAM each); and spark cassandra are collocated on each node. I got read latency 4.032 seconds to read data from cassandra to spark as dataframe. Then I add two more columns to dataframe and then write back to cassandra. I got write latency 7176.550 seconds. Why i got too fast read and too slow write? I have used all spark cassandra connect parameters with default values. even if i changed the config$spark.cassandra.input.split.size_in_mb, it gives me the almost same read latency.

are these results fine? Is there any benchmark available to compare my results?

Russell Spitzer

unread,

May 21, 2018, 8:23:01 AM5/21/18

to spark-conn...@lists.datastax.com

We measure Analytics performance in throughput usually and not latency. We our basic benchmark machines are 16 cores 64 gb ram and checking our nightly runs on a 10 node rf 3 cluster we see reat performance (Local_ONE) at

6m rows a second, or 600k rows per node per second

We see Write Performance at

600K rows a second, or 60k rows per node per second.

Join with Cassandra Table (bulk point lookups) varies depending on the
amount of records requested capping out at 6m like scanning reads.

The application used to generate and run tests is

https://github.com/datastax/spark-cassandra-stress

Which you can check for schema and write patterns, (we used the "performance_row" setting)

On Mon, May 21, 2018, 1:25 AM <purnim...@iet.ahduni.edu.in> wrote:

I use NYC taxi data set of approx. 30 million rows with 5 GB storage. I use sparklyr and crassy package to read/write data from/to spark cassandra. The experiments are performed on a cluster of 4 nodes (8 cores, 16GB RAM each); and spark cassandra are collocated on each node. I got read latency 4.032 seconds to read data from cassandra to spark as dataframe. Then I add two more columns to dataframe and then write back to cassandra. I got write latency 7176.550 seconds. Why i got too fast read and too slow write? I have used all spark cassandra connect parameters with default values. even if i changed the config$spark.cassandra.input.split.size_in_mb, it gives me the almost same read latency.

are these results fine? Is there any benchmark available to compare my results?

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

Ravi Gottipati

unread,

May 22, 2018, 2:13:36 AM5/22/18

to DataStax Spark Connector for Apache Cassandra

On Monday, May 21, 2018 at 11:55:46 AM UTC+5:30, purnim...@iet.ahduni.edu.in wrote:
> I use NYC taxi data set of approx. 30 million rows with 5 GB storage. I use sparklyr and crassy package to read/write data from/to spark cassandra. The experiments are performed on a cluster of 4 nodes (8 cores, 16GB RAM each); and spark cassandra are collocated on each node. I got read latency 4.032 seconds to read data from cassandra to spark as dataframe. Then I add two more columns to dataframe and then write back to cassandra. I got write latency 7176.550 seconds. Why i got too fast read and too slow write? I have used all spark cassandra connect parameters with default values. even if i changed the config$spark.cassandra.input.split.size_in_mb, it gives me the almost same read latency.
>
> are these results fine? Is there any benchmark available to compare my results?

Actual I am able to write 300K for 60 Seconds, data have Maps and Lists also. Each row have the more than 150 Columns. We used only 4 node cluster with two replica (2 core, 4 GB RAM). For writing we used two additional nodes which are not in part of the Cluster (4 Core, 4GB RAM). With less config we able to complete the 30M with in 6000 Seconds with large columns. We used SparkR.

What my observation for your problem is, when writing the data into Cassandra I am seeing more RAM and CPU used compare with reading. Suppose you are using different nodes for writing, then you can write the data very fast, compare with now and you are using good config compare than us.

We have tested with Julia also, our team able to write the data in Cassandra 10 times fast compare than R programming. 30M we are able to write our data set in 700 seconds. Julia with CPP driver.

Our conversion we are doing 900M records to Cassandra. I am not seeing problem with writing, suppose you use different nodes for writing you can get more output.

purnim...@iet.ahduni.edu.in

unread,

May 23, 2018, 2:59:04 AM5/23/18

to DataStax Spark Connector for Apache Cassandra

> Russell Spitzer
> Software Engineer

Thanks Russell, I will try the suggested tool and let you know the results.

purnim...@iet.ahduni.edu.in

unread,

May 23, 2018, 2:59:53 AM5/23/18

to DataStax Spark Connector for Apache Cassandra

Thanks Ravi, this will help me lot.

John Engstrom

unread,

May 24, 2018, 12:02:55 PM5/24/18

to DataStax Spark Connector for Apache Cassandra

Russell, I'm getting an error when building the spark-cassandra-stress. I cloned the git project using master instead of a branch. I'm building using scala 2.11 and am getting the following scala error:

:compileJava
:compileScala
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/apache/scala/com/datastax/sparkstress/ContextHelper.scala:6: error: ConnectHelper is already defined as object ConnectHelper
[ant:scalac] object ConnectHelper {
[ant:scalac] ^
[ant:scalac] one error found
:compileScala FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileScala'.
> Compile failed with 1 error; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 13.827 secs

Russell Spitzer

unread,

May 24, 2018, 12:40:45 PM5/24/18

to spark-conn...@lists.datastax.com

Sorry about that, seems like a dead file was sitting there. Fixed that and it should compile against OSS now.

John Engstrom

unread,

May 24, 2018, 1:05:25 PM5/24/18

to DataStax Spark Connector for Apache Cassandra

Thanks Russell.

John Engstrom

unread,

May 24, 2018, 3:45:23 PM5/24/18

to DataStax Spark Connector for Apache Cassandra

OK, one more question. When using my local Spark Cassandra Connection source to build the stress application I'm getting the following errors. It appears there's some sort of properties file that doesn't have the right dependencies. Something along the lines of the edits I've made to my build.sbt file to get my scala app running under spark-submit:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.2"
libraryDependencies += "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.7")

Here's the list of errors I get when trying to build with "./gradlew jar -Pagainst=source". It does build fine with "./gradlew jar -Pagainst=maven":

[success] Total time: 30 s, completed May 24, 2018 12:03:39 PM
:compileJava
:compileScala
[ant:scalac] warning: Class org.joda.convert.FromString not found - continuing with a stub.
[ant:scalac] warning: Class org.joda.convert.ToString not found - continuing with a stub.
[ant:scalac] warning: Class org.joda.convert.ToString not found - continuing with a stub.
[ant:scalac] warning: Class org.joda.convert.FromString not found - continuing with a stub.
[ant:scalac] warning: Class org.joda.convert.ToString not found - continuing with a stub.
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:7: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector.cql.CassandraConnector
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:9: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector._
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:12: error: object cassandra is not a member of package org.apache.spark.sql
[ant:scalac] import org.apache.spark.sql.cassandra._
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:25: error: not found: value CassandraConnector
[ant:scalac] val numberNodes = CassandraConnector(sc.getConf).withClusterDo(_.getMetadata.getAllHosts.size)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:52: error: value cassandraFormat is not a member of org.apache.spark.sql.DataFrameReader
[ant:scalac] possible cause: maybe a semicolon is missing before `value cassandraFormat'?
[ant:scalac] .cassandraFormat(table, keyspace)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:74: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] case RDD => sc.cassandraTable[String](keyspace, table).select("color", "size").count
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:89: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] case RDD => sc.cassandraTable[String](keyspace, table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:106: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] case RDD => sc.cassandraTable[String](keyspace, table).select("color", "size", "qty",
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:121: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] case RDD => sc.cassandraTable(keyspace, table).cassandraCount()
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:135: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] case RDD => sc.cassandraTable[String](keyspace, table).select("color").count
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:149: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] case RDD => sc.cassandraTable[PerfRowClass](keyspace, table).count
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:165: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] sc.cassandraTable[(UUID, Int, String, String, org.joda.time.DateTime)](keyspace,
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:182: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] sc.cassandraTable[PerfRowClass](keyspace, table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:202: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] sc.cassandraTable[(UUID, Int, String, String, org.joda.time.DateTime)](keyspace, table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:224: error: value joinWithCassandraTable is not a member of org.apache.spark.rdd.RDD[(String,)]
[ant:scalac] possible cause: maybe a semicolon is missing before `value joinWithCassandraTable'?
[ant:scalac] .joinWithCassandraTable[PerfRowClass](keyspace, table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:248: error: value repartitionByCassandraReplica is not a member of org.apache.spark.rdd.RDD[(String,)]
[ant:scalac] possible cause: maybe a semicolon is missing before `value repartitionByCassandraReplica'?
[ant:scalac] .repartitionByCassandraReplica(keyspace, table, coresPerNode)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:265: error: value joinWithCassandraTable is not a member of org.apache.spark.rdd.RDD[(String,)]
[ant:scalac] possible cause: maybe a semicolon is missing before `value joinWithCassandraTable'?
[ant:scalac] .joinWithCassandraTable[PerfRowClass](keyspace, table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/ReadTask.scala:288: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] sc.cassandraTable[String](keyspace, table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/StreamingTask.scala:4: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector.streaming._
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/StreamingTask.scala:5: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector.writer.RowWriterFactory
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/StreamingTask.scala:3: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector.cql.CassandraConnector
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/StreamingTask.scala:42: error: not found: value CassandraConnector
[ant:scalac] val cc = CassandraConnector(ss.sparkContext.getConf)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:10: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector._
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:17: error: object cassandra is not a member of package org.apache.spark.sql
[ant:scalac] import org.apache.spark.sql.cassandra._
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/StreamingTask.scala:101: error: value saveToCassandra is not a member of org.apache.spark.streaming.dstream.DStream[com.datastax.sparkstress.RowTypes.PerfRowClass]
[ant:scalac] override def dstreamOps(dstream: DStream[PerfRowClass]): Unit = dstream.saveToCassandra(config.keyspace, config.table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/StressTask.scala:4: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector.cql.CassandraConnector
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/StressTask.scala:20: error: not found: type CassandraConnector
[ant:scalac] def getLocalDC(cc: CassandraConnector): String = {
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:3: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector.cql.CassandraConnector
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:4: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector.writer.RowWriterFactory
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:26: error: not found: type RowWriterFactory
[ant:scalac] (implicit rwf: RowWriterFactory[rowType]) extends StressTask {
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:33: error: not found: value CassandraConnector
[ant:scalac] val cc = CassandraConnector(sc.getConf)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/apache/com/datastax/bdp/spark/writer/BulkTableWriter.scala:6: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector._
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/apache/com/datastax/bdp/spark/writer/BulkTableWriter.scala:7: error: object spark is not a member of package com.datastax
[ant:scalac] import com.datastax.spark.connector.writer._
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:71: error: value cassandraFormat is not a member of org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row]
[ant:scalac] possible cause: maybe a semicolon is missing before `value cassandraFormat'?
[ant:scalac] .cassandraFormat(destination.table, destination.keyspace)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/apache/com/datastax/bdp/spark/writer/BulkTableWriter.scala:23: error: not found: type ColumnSelector
[ant:scalac] columns: ColumnSelector = AllColumns,
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:90: error: value saveToCassandra is not a member of org.apache.spark.rdd.RDD[rowType]
[ant:scalac] case SaveMethod.Driver => getRDD.saveToCassandra(destination.keyspace, destination.table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:142: error: not found: type RowWriterFactory
[ant:scalac] WriteTask[ShortRowClass](config, ss)(implicitly[RowWriterFactory[ShortRowClass]]) {
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:168: error: not found: type RowWriterFactory
[ant:scalac] WriteTask[PerfRowClass](config, ss)(implicitly[RowWriterFactory[PerfRowClass]]) {
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:202: error: not found: type RowWriterFactory
[ant:scalac] WriteTask[WideRowClass](config, ss)(implicitly[RowWriterFactory[WideRowClass]]) {
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:232: error: not found: type RowWriterFactory
[ant:scalac] WriteTask[WideRowClass](config, ss)(implicitly[RowWriterFactory[WideRowClass]]) {
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:261: error: not found: type RowWriterFactory
[ant:scalac] WriteTask[WideRowClass](config, ss)(implicitly[RowWriterFactory[WideRowClass]]) {
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:286: error: not found: type RowWriterFactory
[ant:scalac] WriteTask[PerfRowClass](config, ss)(implicitly[RowWriterFactory[PerfRowClass]]) {
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:303: error: value cassandraTable is not a member of org.apache.spark.SparkContext
[ant:scalac] sc.cassandraTable[PerfRowClass](config.keyspace, config.table)
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/main/scala/com/datastax/sparkstress/WriteTask.scala:306: error: value cassandraFormat is not a member of org.apache.spark.sql.DataFrameReader
[ant:scalac] ss.read.cassandraFormat(config.table, config.keyspace).load()
[ant:scalac] ^
[ant:scalac] /Users/engstrom/spark_cassandra_stress/spark-cassandra-stress/src/apache/com/datastax/bdp/spark/writer/BulkTableWriter.scala:23: error: not found: value AllColumns
[ant:scalac] columns: ColumnSelector = AllColumns,
[ant:scalac] ^
[ant:scalac] 5 warnings found
[ant:scalac] 45 errors found
:compileScala FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileScala'.

> Compile failed with 45 errors; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 58.299 secs

Russell Spitzer

unread,

May 24, 2018, 3:59:55 PM5/24/18

to spark-conn...@lists.datastax.com

Looks like Source is failing to actually build the SCC main project, all those errors are SCC classes which should exist. The "source" build is supposed to build the SCC project and then use the newly built library as a dependency for the Stress build. I can check this out later but that does seem to be crux of it

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

John Engstrom

unread,

May 24, 2018, 4:01:03 PM5/24/18

to DataStax Spark Connector for Apache Cassandra

Although it appears that the successful maven build doesn't run quite right:

engstrommac:spark-cassandra-stress engstrom$ ./run.sh apache --help
Submit Script:: spark-submit --class com.datastax.sparkstress.SparkCassandraStress build/libs/SparkCassandraStress-1.0.jar --help
18/05/24 15:00:10.562 INFO Reflections: Reflections took 224 ms to scan 1 urls, producing 32 keys and 329 values
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.collect.Sets$SetView.iterator()Lcom/google/common/collect/UnmodifiableIterator;
at org.reflections.Reflections.expandSuperTypes(Reflections.java:380)
at org.reflections.Reflections.<init>(Reflections.java:126)
at org.reflections.Reflections.<init>(Reflections.java:168)
at org.reflections.Reflections.<init>(Reflections.java:141)
at com.datastax.sparkstress.SparkCassandraStress$.<init>(SparkCassandraStress.scala:67)
at com.datastax.sparkstress.SparkCassandraStress$.<clinit>(SparkCassandraStress.scala)
at com.datastax.sparkstress.SparkCassandraStress.main(SparkCassandraStress.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:744)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
engstrommac:spark-cassandra-stress engstrom$

Russell Spitzer

unread,

May 24, 2018, 4:03:50 PM5/24/18

to spark-conn...@lists.datastax.com

That looks like a guava mismatch ...

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

John Engstrom

unread,

May 24, 2018, 4:08:09 PM5/24/18

to DataStax Spark Connector for Apache Cassandra

And run with real parameters:

engstrommac:spark-cassandra-stress engstrom$ ./run.sh apache -p 4 -y 262144 -o 262144 -d -n 4 -S results.tsv -m driver writerandomwiderow
Submit Script:: spark-submit --class com.datastax.sparkstress.SparkCassandraStress build/libs/SparkCassandraStress-1.0.jar -p 4 -y 262144 -o 262144 -d -n 4 -S results.tsv -m driver writerandomwiderow
18/05/24 15:06:33.693 INFO Reflections: Reflections took 211 ms to scan 1 urls, producing 32 keys and 329 values

John Engstrom

unread,

May 24, 2018, 4:09:41 PM5/24/18

to DataStax Spark Connector for Apache Cassandra

On Thursday, May 24, 2018 at 2:59:55 PM UTC-5, Russell Spitzer wrote:
> Looks like Source is failing to actually build the SCC main project, all those errors are SCC classes which should exist. The "source" build is supposed to build the SCC project and then use the newly built library as a dependency for the Stress build. I can check this out later but that does seem to be crux of it

Russell - seeing as I can built it with maven it's certainly not stopping me from doing anything. If you get a chance to look at it at some point that'd be great, otherwise I'm happy to use maven.

Russell Spitzer

unread,

May 24, 2018, 4:10:34 PM5/24/18

to spark-conn...@lists.datastax.com

Let me ping the dev who added that in, i'm not quite sure, perhaps a dependency is missing from the runtime classpath that is on the DSE classpath

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

John Engstrom

unread,

May 24, 2018, 4:12:17 PM5/24/18

to DataStax Spark Connector for Apache Cassandra

On Thursday, May 24, 2018 at 3:03:50 PM UTC-5, Russell Spitzer wrote:
> That looks like a guava mismatch ...

Any chance that could be caused by something in my environment? Or is it likely to be a mismatch between guava version being brought down by maven?

John Engstrom

unread,

May 24, 2018, 4:13:01 PM5/24/18

to DataStax Spark Connector for Apache Cassandra

On Thursday, May 24, 2018 at 3:10:34 PM UTC-5, Russell Spitzer wrote:
> Let me ping the dev who added that in, i'm not quite sure, perhaps a dependency is missing from the runtime classpath that is on the DSE classpath

Thanks, I appreciate that.

John Engstrom

unread,

May 25, 2018, 12:31:02 PM5/25/18

to DataStax Spark Connector for Apache Cassandra

Any news from the dev who added the guava?

Also, just to let you know when building with '-Pagainst=source' it IS building the SCC. I'm sure because before the build there is no SCC assembly jar file and after the build there is. Plus there's output as part of the gradle build showing the SCC being built and tested. But when it comes time to build the scala in spark-cassandra-stress it appears to not be able to find it. I tried adding a 'compile fileTree(...)' to the build.gradle file and specifying the directory under SPARKCC_HOME where the assembly jar file gets built. I ended up with the same scala compile errors.

I'd appreciate any help, tips, or suggestions you could provide.

Thanks

Russell Spitzer

unread,

May 25, 2018, 8:00:36 PM5/25/18

to spark-conn...@lists.datastax.com

I took a look and saw the Refelections library which is used to get the Test list brings in a Guava 18.0 dependency which is present on the DSE classpath but not present on the Apache spark Classpath which brings in guava 11 or something like that because of hadoop. I tried just adding the dependencies via --packages org.reflections:reflections:0.9.10 but it seems like the older guava is still winning out. The way to make this work again is to probably shade the guava as used by the connector.

Russell Spitzer

unread,

May 25, 2018, 8:34:30 PM5/25/18

to spark-conn...@lists.datastax.com

Try this (worked with Spark 2.2)
https://github.com/datastax/spark-cassandra-stress/pull/37

John Engstrom

unread,

May 29, 2018, 10:44:28 AM5/29/18

to DataStax Spark Connector for Apache Cassandra

Russell, I just tried building with the changes in build.gradle and apparently something about my environment hates the change from "sbt/sbt assembly" to "sbt/sbt jar". Here's the output when I try to build against source:

engstrommac:spark-cassandra-stress engstrom$ ./gradlew jar -Pagainst=source
Checking dependency flag: source
Using Assembly Jar from Source Repo
2.0.6
:build_connector
Attempting to fetch sbt
Launching sbt from sbt/sbt-launch-0.13.12.jar
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=350m; support was removed in 8.0
[info] Loading project definition from /Users/engstrom/spark_cassandra_stress/spark-cassandra-connector/project
[info] Updating {file:/Users/engstrom/spark_cassandra_stress/spark-cassandra-connector/project/}spark-cassandra-connector-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * net.virtual-void:sbt-dependency-graph:0.7.4 -> 0.8.2
[warn] Run 'evicted' to see detailed eviction warnings
[info] Compiling 7 Scala sources to /Users/engstrom/spark_cassandra_stress/spark-cassandra-connector/project/target/scala-2.10/sbt-0.13/classes...
[warn] there were 4 deprecation warning(s); re-run with -deprecation for details
[warn] there were 7 feature warning(s); re-run with -feature for details
[warn] two warnings found
Using releases: https://oss.sonatype.org/service/local/staging/deploy/maven2 for releases
Using snapshots: https://oss.sonatype.org/content/repositories/snapshots for snapshots

Scala: 2.10.6 [To build against Scala 2.11 use '-Dscala-2.11=true']
Scala Binary: 2.10
Java: target=1.7 user=1.8.0_92
Cassandra version for testing: 3.6 [can be overridden by specifying '-Dtest.cassandra.version=<version>']

[info] Set current project to root (in build file:/Users/engstrom/spark_cassandra_stress/spark-cassandra-connector/)
[error] Not a valid command: jar
[error] Not a valid key: jar (similar: run, apiUrl, target)
[error] jar
[error] ^
:build_connector FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':build_connector'.
> Process 'command 'sbt/sbt'' finished with non-zero exit value 1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 37.935 secs
engstrommac:spark-cassandra-stress engstrom$

Russell Spitzer

unread,

May 29, 2018, 10:47:46 AM5/29/18

to spark-conn...@lists.datastax.com

sorry ddin't mean to commit that just fixed the maven part

On Tue, May 29, 2018 at 9:44 AM John Engstrom <jpe...@gmail.com> wrote:

Russell, I just tried building with the changes in build.gradle and apparently something about my environment hates the change from "sbt/sbt assembly" to "sbt/sbt jar". Here's the output when I try to build against source:

engstrommac:spark-cassandra-stress engstrom$ ./gradlew jar -Pagainst=source
Checking dependency flag: source
Using Assembly Jar from Source Repo
2.0.6
:build_connector
Attempting to fetch sbt
Launching sbt from sbt/sbt-launch-0.13.12.jar
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=350m; support was removed in 8.0
[info] Loading project definition from /Users/engstrom/spark_cassandra_stress/spark-cassandra-connector/project

[info] Updating {file:/Users/engstrom/spark_cassandra_stress/spark-cassandra-connector/project/%7Dspark-cassandra-connector-build...

[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[warn] There may be incompatibilities among your library dependencies.
[warn] Here are some of the libraries that were evicted:
[warn] * net.virtual-void:sbt-dependency-graph:0.7.4 -> 0.8.2
[warn] Run 'evicted' to see detailed eviction warnings
[info] Compiling 7 Scala sources to /Users/engstrom/spark_cassandra_stress/spark-cassandra-connector/project/target/scala-2.10/sbt-0.13/classes...
[warn] there were 4 deprecation warning(s); re-run with -deprecation for details
[warn] there were 7 feature warning(s); re-run with -feature for details
[warn] two warnings found

Using releases: https://urldefense.proofpoint.com/v2/url?u=https-3A__oss.sonatype.org_service_local_staging_deploy_maven2&d=DwIBaQ&c=adz96Xi0w1RHqtPMowiL2g&r=mPa4DVY9Tr2PgOr6pcYcDSTS5OGYiRXFr0-h3mIgaEU&m=aeTH2jo051TvBx-sQbgE3gpUCeQuZu84j5WUs5a6yOI&s=ITz27WRagW0Dov5xo0MOG5p5ak9NgEyQzabYfsnVLzA&e= for releases
Using snapshots: https://urldefense.proofpoint.com/v2/url?u=https-3A__oss.sonatype.org_content_repositories_snapshots&d=DwIBaQ&c=adz96Xi0w1RHqtPMowiL2g&r=mPa4DVY9Tr2PgOr6pcYcDSTS5OGYiRXFr0-h3mIgaEU&m=aeTH2jo051TvBx-sQbgE3gpUCeQuZu84j5WUs5a6yOI&s=WgWAFHVbLpJ9zaLfPuRRsIV_CCDaDrN6FsAQRIcGzRE&e= for snapshots

Scala: 2.10.6 [To build against Scala 2.11 use '-Dscala-2.11=true']
Scala Binary: 2.10
Java: target=1.7 user=1.8.0_92
Cassandra version for testing: 3.6 [can be overridden by specifying '-Dtest.cassandra.version=<version>']

[info] Set current project to root (in build file:/Users/engstrom/spark_cassandra_stress/spark-cassandra-connector/)
[error] Not a valid command: jar
[error] Not a valid key: jar (similar: run, apiUrl, target)
[error] jar
[error] ^
:build_connector FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':build_connector'.
> Process 'command 'sbt/sbt'' finished with non-zero exit value 1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 37.935 secs
engstrommac:spark-cassandra-stress engstrom$

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Russell Spitzer

unread,

May 29, 2018, 10:48:01 AM5/29/18

to spark-conn...@lists.datastax.com

that should probably be package

Reply all

Reply to author

Forward