Does foreachPartition or foreachPartitionAsync execute in parallel across spark workers ?

kant kodali

unread,

Oct 9, 2016, 3:27:00 AM10/9/16

to spark-conn...@lists.datastax.com

I want to do a computation across each partition and since my computation is independent across partitions I would like the computation to be executed in parallel. so I am wondering if foreachPartition or foreachPartitionAsync can execute across spark worker machines in parallel. Also it sounds like foreachPartition is blocking and foreachPartitionAsync is non-blocking however it would be great if someone can explain where in Spark execution path this would make a difference and why?

javaFunctions(sc).cassandraTable().foreachPartition()

Thanks a lot!

kant kodali

unread,

Oct 9, 2016, 4:12:49 AM10/9/16

to spark-conn...@lists.datastax.com

javaFunctions(sc).cassandraTable().foreachPartition() is it Cassandra partition or RDD partition? I read somewhere that multiple Cassandra Partitions

can be mapped to one RDD partition. is that true?

Russell Spitzer

unread,

Oct 9, 2016, 11:27:25 AM10/9/16

to spark-conn...@lists.datastax.com

Spark partition, token ranges from Cassandra are mapped to single spark partitions

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

kant kodali

unread,

Oct 9, 2016, 2:41:46 PM10/9/16

to spark-conn...@lists.datastax.com

Hi Russell,

Is it the entire token range of one Cassandra node mapped to single spark partition? that way number of Cassandra nodes is equal to number of number of spark partitions or is there is a possibility a token range of single Cassandra node is broken down to multiple sub ranges and then each sub range is mapped to single spark partition?

Thanks!

Russell Spitzer

unread,

Oct 9, 2016, 2:47:33 PM10/9/16

to spark-conn...@lists.datastax.com

https://academy.datastax.com/resources/how-spark-cassandra-connector-reads-data

kant kodali

unread,

Oct 9, 2016, 3:50:57 PM10/9/16

to spark-conn...@lists.datastax.com

Hi Russell,

Thanks for this video. This clarifies lot of my questions but how do I read all Cassandra rows that belong to a particular Cassandra partition? and I do want to parallelize this process across Cassandra partitions.

kant

Russell Spitzer

unread,

Oct 9, 2016, 3:53:26 PM10/9/16

to spark-conn...@lists.datastax.com

https://github.com/datastax/spark-cassandra-connector/blob/75719dfe0e175b3e0bb1c06127ad4e6930c73ece/doc/3_selection.md#grouping-rows-by-partition-key

kant kodali

unread,

Oct 9, 2016, 7:39:38 PM10/9/16

to spark-conn...@lists.datastax.com

Hi Russell,

I tried SpanBy but look like there is a strange error that happening no matter which way I try. Like the one here described for Java solution.

http://qaoverflow.com/question/how-to-use-spanby-in-java/

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD

JavaPairRDD<ByteBuffer, Iterable<CassandraRow>> cassandraRowsRDD= javaFunctions(sc).cassandraTable("test", "hello" )
.select("col1", "col2", "col3" )
.spanBy(new Function<CassandraRow, ByteBuffer>() {
@Override
public ByteBuffer call(CassandraRow v1) {
return v1.getBytes("rowkey");
}
}, ByteBuffer.class);

And then here I do this here is where the problem occurs

List<Tuple2<ByteBuffer, Iterable<CassandraRow>>> listOftuples = cassandraRowsRDD.collect(); // ERROR OCCURS HERE
Tuple2<ByteBuffer, Iterable<CassandraRow>> tuple = listOftuples.iterator().next();
ByteBuffer partitionKey = tuple._1();
for(CassandraRow cassandraRow: tuple._2()) {
System.out.println(cassandraRow.getLong("col1"));
}

so I tried this and same error

Iterable<Tuple2<ByteBuffer, Iterable<CassandraRow>>> listOftuples = cassandraRowsRDD.collect(); // ERROR OCCURS HERE
Tuple2<ByteBuffer, Iterable<CassandraRow>> tuple = listOftuples.iterator().next();
ByteBuffer partitionKey = tuple._1();
for(CassandraRow cassandraRow: tuple._2()) {
System.out.println(cassandraRow.getLong("col1"));
}

I have also tried cassandraRowsRDD.collect().forEach() and cassandraRowsRDD.stream().forEachPartition() and the same exact error occurs.

May I know how can I fix this?

Thanks,

kant

Russell Spitzer

unread,

Oct 9, 2016, 9:39:21 PM10/9/16

to spark-conn...@lists.datastax.com

ByteBuffers aren't serializable so you can't collect them. At least that's my first guess

kant kodali

unread,

Oct 9, 2016, 10:19:02 PM10/9/16

to spark-conn...@lists.datastax.com

Changed everything to byte[] array so there is no more bytebuffers in the code but still same exact error persists. Still trying to debug further..

Russell Spitzer

unread,

Oct 9, 2016, 10:29:22 PM10/9/16

to spark-conn...@lists.datastax.com

Apparently this is another effect of using the system classpath instead of --jars so says
http://stackoverflow.com/questions/35529490/spark-1-6-0-throwing-classcast-exception-in-cluster-mode-works-fine-in-local-mod

Russell Spitzer

unread,

Oct 9, 2016, 10:30:28 PM10/9/16

to spark-conn...@lists.datastax.com

Or mismatch https://issues.apache.org/jira/browse/SPARK-9219

Russell Spitzer

unread,

Oct 9, 2016, 10:34:31 PM10/9/16

to spark-conn...@lists.datastax.com

Or too many tasks http://stackoverflow.com/questions/34631413/spark-1-6-0-executor-dies-because-of-classcastexception-and-causes-timeout

kant kodali

unread,

Oct 9, 2016, 10:55:03 PM10/9/16

to spark-conn...@lists.datastax.com

Hi Russell,

Thanks for the effort I did look into these and I am still scratching my head. I am running everything locally and in a stand alone mode so my spark cluster is just running on localhost.

Scala code runner version 2.11.8 // when I run scala -version or even ./spark-shell

compile group: 'org.apache.spark' name: 'spark-core_2.11' version: '2.0.0'

compile group: 'org.apache.spark' name: 'spark-streaming_2.11' version: '2.0.0'

compile group: 'org.apache.spark' name: 'spark-sql_2.11' version: '2.0.0'

compile group: 'com.datastax.spark' name: 'spark-cassandra-connector_2.11' version: '2.0.0-M3':

So I don't see anything wrong with these versions.

2) one of the link says you should mark dependencies "provided". I use Java and gradle so I am not sure how to do that.

3) I am bundling everything into one jar and so far it did worked out well except for this error.

Russell Spitzer

unread,

Oct 10, 2016, 12:36:55 AM10/10/16

to spark-conn...@lists.datastax.com

Are you still not using Spark Submit? When you say "Bundling into one jar" what do you mean? "Provided" is a term which means that the libraries will be on the runtime classpath and therefore the dependencies do not have to be added to a fat jar. The errors could be because the application code is on the system classpath.

Russell Spitzer

unread,

Oct 10, 2016, 12:38:09 AM10/10/16

to spark-conn...@lists.datastax.com

The "version" errors could be if the StandAlone cluster you are running against has a different version of scala than the one you are compiling against. Note "Standalone" and "Local" are very different things.

kant kodali

unread,

Oct 10, 2016, 12:48:14 AM10/10/16

to spark-conn...@lists.datastax.com

I am not yet using Spark Submit because there are bunch of other projects that work on follow the same pattern as below.

SparkConf sparkConf = config.buildSparkConfig();

sparkConf.setJars(JavaSparkContext.jarOfClass(SparkDriver.class));

JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(config.getSparkStremingBatchInterval()));

ssc.sparkContext().setLogLevel("ERROR");

Receiver receiver = new Receiver(config);

JavaReceiverInputDStream<String> jsonMessagesDStream = ssc.receiverStream(receiver);

jsonMessagesDStream.count()

ssc.start();

ssc.awaitTermination();

And I run gradle clean build which builds one Jar. And I just run that one Jar

For spark-cassandra-connector project as well I followed the similar pattern as above and I was able to read a sample row and I was able to do a count which returned me that there were 1M rows (which is accurate). so this approach is indeed working except when I used SpanBy and want to print only one cassandra partition (which has about 40 rows) and thats where I get this error

java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD

Russell Spitzer

unread,

Oct 10, 2016, 12:52:58 AM10/10/16

to spark-conn...@lists.datastax.com

If it isn't working for you with your code as is perhaps you should try it the supported way instead?

kant kodali

unread,

Oct 10, 2016, 1:02:36 AM10/10/16

to spark-conn...@lists.datastax.com

Sure but I wouldn't really say it is not working because It Indeed worked in every case besides this one and since I am new to spark I don't really know how spark submit works. There is some learning I have to do which I am looking at right now..but the bigger question for me is not suspecting the approach since it worked in almost every other case but rather why it isn't working for this one case..Anyways looking into spark submit..

Russell Spitzer

unread,

Oct 10, 2016, 1:06:59 AM10/10/16

to spark-conn...@lists.datastax.com

There is also

JavaPairRDD<ByteBuffer, Iterable<CassandraRow>> cassandraRowsRDD= javaFunctions(sc).cassandraTable("test", "hello" )
.select("col1", "col2", "col3" )
.spanBy(new Function<CassandraRow, ByteBuffer>() {
@Override
public ByteBuffer call(CassandraRow v1) {
return v1.getBytes("rowkey");
}
}, ByteBuffer.class);

Where there should be now field "rowkey" since you aren't selecting that out of the table.

Russell Spitzer

unread,

Oct 10, 2016, 1:12:30 AM10/10/16

to spark-conn...@lists.datastax.com

And of course as a quick check you can always try using the spark shell (uses spark submit)

You would start the shell with something like

./bin/spark-shell --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.11 --conf spark.cassandra.connection.host=127.0.0.1 --master spark://rspitzer-rmbp15.local:7077

then run

scala> import com.datastax.spark.connector._

import com.datastax.spark.connector._

scala> spark.sparkContext.cassandraTable("test","test").spanBy( row => row.get[Int]("k")).collect

res0: Array[(Int, Iterable[com.datastax.spark.connector.CassandraRow])] = Array((5,ArrayBuffer(CassandraRow{k: 5, v: 5})), (10,ArrayBuffer(CassandraRow{k: 10, v: 10})), (1,ArrayBuffer(CassandraRow{k: 1, v: 1})), (8,ArrayBuffer(CassandraRow{k: 8, v: 8})), (2,ArrayBuffer(CassandraRow{k: 2, v: 2})), (4,ArrayBuffer(CassandraRow{k: 4, v: 4})), (7,ArrayBuffer(CassandraRow{k: 7, v: 7})), (6,ArrayBuffer(CassandraRow{k: 6, v: 6})), (9,ArrayBuffer(CassandraRow{k: 9, v: 9})), (3,ArrayBuffer(CassandraRow{k: 3, v: 3})))

#M2 is for Spark Packages release, don't use M2 unless you are using the packages artifact

Russell Spitzer

unread,

Oct 10, 2016, 1:26:31 AM10/10/16

to spark-conn...@lists.datastax.com

And just in case, here is the code for a blob column

scala> spark.sparkContext.cassandraTable("test","hello").spanBy( row => row.get[Array[Byte]]("rowkey")).collect

res2: Array[(Array[Byte], Iterable[com.datastax.spark.connector.CassandraRow])] = Array((Array(0, 0, 0, 1),ArrayBuffer(CassandraRow{rowkey: 0x00000001, col1: 1, col2: 1, col3: 1})), (Array(0, 0, 0, 1),ArrayBuffer(CassandraRow{rowkey: 0x00000001, col1: 2, col2: 1, col3: 1})), (Array(0, 0, 0, 1),ArrayBuffer(CassandraRow{rowkey: 0x00000001, col1: 3, col2: 1, col3: 1})), (Array(0, 0, 0, 0),ArrayBuffer(CassandraRow{rowkey: 0x00000000, col1: 1, col2: 1, col3: 1})), (Array(0, 0, 0, 0),ArrayBuffer(CassandraRow{rowkey: 0x00000000, col1: 2, col2: 1, col3: 1})), (Array(0, 0, 0, 0),ArrayBuffer(CassandraRow{rowkey: 0x00000000, col1: 3, col2: 1, col3: 1})))

kant kodali

unread,

Oct 10, 2016, 1:29:00 AM10/10/16

to spark-conn...@lists.datastax.com

This is it I think..

JavaPairRDD<byte[], Iterable<CassandraRow>> cassandraRowsRDD= javaFunctions(sc).cassandraTable("test", "hello" )

.select("rowkey", "col1", "col2", "col3", )

.spanBy(new Function<CassandraRow, byte[]>() {
@Override
public byte[] call(CassandraRow v1) {
return v1.getBytes("rowkey");
}
}, byte[].class);

Iterable<Tuple2<byte[], Iterable<CassandraRow>>> listOftuples = cassandraRowsRDD.collect();
Tuple2<byte[], Iterable<CassandraRow>> tuple = listOftuples.iterator().next();
byte[] partitionKey = tuple._1();
for(CassandraRow cassandraRow: tuple._2()) {
System.out.println("************START************");
System.out.println(new String(partitionKey));
System.out.println("************END************");
}

I thought I was doing select col1, col2, col3 from hello where rowkey="oxab" but I clearly wasn't.

Now I get the following error. Am I not just printing one Cassandra Partition from the code above? I have 1M row in my Cassandra node.

16/10/09 22:12:07 ERROR TaskSchedulerImpl: Lost executor 0 on 192.168.1.182: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

16/10/09 22:12:14 ERROR TaskSchedulerImpl: Lost executor 1 on 192.168.1.182: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

16/10/09 22:12:23 ERROR TaskSchedulerImpl: Lost executor 2 on 192.168.1.182: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

16/10/09 22:12:31 ERROR TaskSchedulerImpl: Lost executor 3 on 192.168.1.182: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Russell Spitzer

unread,

Oct 10, 2016, 2:03:17 AM10/10/16

to spark-conn...@lists.datastax.com

The "select' operations does a column pruning so not as much data is pulled from cassandra (please see spark cassandra connector docs) :
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md#selecting-columns---select

Your "Collect" operation (please see the spark docs) takes the entire RDD and pulls it back into memory on the driver.
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD

def collect(): Array[T]

Return an array that contains all of the elements in this RDD.

Note; this method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.

So you are in fact pulling all of the records back as an Array and then printing out a single member of that Array.

Your executors are failing, you need to check their logs and find out why. This will be in the executor work directory. In a default spark install this is $SPARK_HOME/work/app-#####/executor#/[stdout|stderr]

kant kodali

unread,

Oct 10, 2016, 2:10:01 AM10/10/16

to spark-conn...@lists.datastax.com

Thanks a ton for that clarification! so I dropped my entire Keyspace which had 1M rows. I am reading only from one table and that table has 10 rows. Still the error persists

: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 23, 192.168.1.182): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Now I checked the executor logs it says the following

ERROR CoarseGrainedExecutorBackend: Unable to create executor due to Can't assign requested address: Service 'org.apache.spark.network.netty.NettyBlockTransferService' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'org.apache.spark.network.netty.NettyBlockTransferService' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.

java.net.BindException: Can't assign requested address: Service 'org.apache.spark.network.netty.NettyBlockTransferService' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'org.apache.spark.network.netty.NettyBlockTransferService' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries

Still debugging further.

kant kodali

unread,

Oct 10, 2016, 2:32:29 AM10/10/16

to spark-conn...@lists.datastax.com

Based on the suggestion on Google for the Exception in the executor logs

java.net.BindException: Can't assign requested address: Service 'org.apache.spark.network.netty.NettyBlockTransferService' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'org.apache.spark.network.netty.NettyBlockTransferService' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries

I went to spark_home/bin/load-spark-env.sh and I added the following line

export SPARK_LOCAL_IP="127.0.0.1"

and I restarted the cluster..Now I am back to square 1.. I get the original error

Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD

But my code is the same as below

JavaPairRDD<byte[], Iterable<CassandraRow>> cassandraRowsRDD= javaFunctions(sc).cassandraTable("test", "hello" )

.select("rowkey", "col1", "col2", "col3", )

.spanBy(new Function<CassandraRow, byte[]>() {
@Override
public byte[] call(CassandraRow v1) {
return v1.getBytes("rowkey");
}
}, byte[].class);

Iterable<Tuple2<byte[], Iterable<CassandraRow>>> listOftuples = cassandraRowsRDD.collect();
Tuple2<byte[], Iterable<CassandraRow>> tuple = listOftuples.iterator().next();
byte[] partitionKey = tuple._1();
for(CassandraRow cassandraRow: tuple._2()) {
System.out.println("************START************");
System.out.println(new String(partitionKey));
System.out.println("************END************");
}

@Russell you pointed to not use M2 but I am using M3. should I change it to the one below?

compile group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.10', version: '1.6.2'

Thanks!

Russell Spitzer

unread,

Oct 10, 2016, 12:29:09 PM10/10/16

to spark-conn...@lists.datastax.com

DO NOT CHANGE TO M2. Like I said that is only for if you are using the "Packages" repository.
https://spark-packages.org/package/datastax/spark-cassandra-connector
The maven artifact M3 is the correct one. There was an error publishing the M2 artifact to maven.

The Spark Local is yet another thing Spark Submit would be taking care of.

Have you tried my spark shell examples yet?

What version of spark are you running, both me and the folks on the Spark mailing list think this is probably a version mismatch. When you run ./sbin/start-all what version of spark is that?

You gradle file is also... not so good, check this as a reference

https://github.com/datastax/SparkBuildExamples/blob/master/java/gradle/oss/build.gradle
If you aren't using spark submit, and you really should be, you can't mark the connector as provided since you need it on the runtime classpath and you can't use --packages.

kant kodali

unread,

Oct 10, 2016, 2:24:56 PM10/10/16

to spark-conn...@lists.datastax.com

Hi Russell,

Responded to the questions inline

On Mon, Oct 10, 2016 9:28 AM, Russell Spitzer russell...@gmail.com wrote:

DO NOT CHANGE TO M2. Like I said that is only for if you are using the "Packages" repository.
https://spark-packages.org/package/datastax/spark-cassandra-connector
The maven artifact M3 is the correct one. There was an error publishing the M2 artifact to maven.

The Spark Local is yet another thing Spark Submit would be taking care of.

Have you tried my spark shell examples yet? yes did it this morning and it works fine. It works even if I change all my gradle dependencies to compile and it works when I follow the sample you gave me although I am highly skeptical of this gradle code if is right or not so I am attaching them here so please let me which looks more correct. And finally if I don't use spark submit just like I used to in the past it doesn't work and I dont understand where the problem is as it is not working for only this problem but even if I do simple count on cassandra without using spark-submit approach it worked fine.

./bin/spark-submit --class com.company.batchprocessing.hello --master local[8] build/libs/batchprocessing.jar

What version of spark are you running, both me and the folks on the Spark mailing list think this is probably a version mismatch. When you run ./sbin/start-all what version of spark is that? 2.0.0

Or mismatch https://issues.apache.org/jira/browse/SPARK-9219

https://github.com/datastax/spark-cassandra-connector/blob/75719dfe0e175b3e0bb1c06127ad4e6930c73ece/doc/3_selection.md#grouping-rows-by-partition-key

https://academy.datastax.com/resources/how-spark-cassandra-connector-reads-data

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

build1.gradle

builld2.gradle

Russell Spitzer

unread,

Oct 10, 2016, 2:32:10 PM10/10/16

to spark-conn...@lists.datastax.com

It's probably a classloader issue that you just hit now. Previously you may have narrowly avoided it based on what objects you were serializing or the order of operations. Or whether the driver was involved in receiving RDD data or a variety of other things. It is much more likely that you are just newly exercising a code path that you previously did not use. For example imagine C code where you are writing a string into a buffer that is undersized. Most of the time it will seem like everything is working. Only on certain inputs will you end up with a segmentation fault as you go over the buffer and overwrite other memory on the heap.

Unfortunately I don't have a lot of time to help more on this as I have to get back to work :) It seems like you are pretty close to running so just keep at it and i'm sure you will get there,

Russ

Or mismatch https://issues.apache.org/jira/browse/SPARK-9219

https://github.com/datastax/spark-cassandra-connector/blob/75719dfe0e175b3e0bb1c06127ad4e6930c73ece/doc/3_selection.md#grouping-rows-by-partition-key

https://academy.datastax.com/resources/how-spark-cassandra-connector-reads-data

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

kant kodali

unread,

Oct 10, 2016, 2:58:42 PM10/10/16

to spark-conn...@lists.datastax.com

Thanks a lot for your help. All the credit goes to you!

Or mismatch https://issues.apache.org/jira/browse/SPARK-9219

https://github.com/datastax/spark-cassandra-connector/blob/75719dfe0e175b3e0bb1c06127ad4e6930c73ece/doc/3_selection.md#grouping-rows-by-partition-key

https://academy.datastax.com/resources/how-spark-cassandra-connector-reads-data

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

kant kodali

unread,

Oct 11, 2016, 1:48:58 PM10/11/16

to spark-conn...@lists.datastax.com

Hi Russell,

After reading through the scripts of spark-submit and spark-class it indeed looks like a classLoader issue because spark-submit is replacing the classLoader with this org.apache.spark.deploy although I am not sure why this isn't happening underneath Spark API' when I use code like below.

There should be a way to replace parent or even bootClassloader i think. Anyways just wanted to share my thoughts.

SparkConf sparkConf = config.buildSparkConfig();

sparkConf.setJars(JavaSparkContext.jarOfClass(SparkDriver.class));

JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(1000);

Or mismatch https://issues.apache.org/jira/browse/SPARK-9219

https://github.com/datastax/spark-cassandra-connector/blob/75719dfe0e175b3e0bb1c06127ad4e6930c73ece/doc/3_selection.md#grouping-rows-by-partition-key

https://academy.datastax.com/resources/how-spark-cassandra-connector-reads-data

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

...

[Message clipped]

Or mismatch https://issues.apache.org/jira/browse/SPARK-9219

https://github.com/datastax/spark-cassandra-connector/blob/75719dfe0e175b3e0bb1c06127ad4e6930c73ece/doc/3_selection.md#grouping-rows-by-partition-key

https://academy.datastax.com/resources/how-spark-cassandra-connector-reads-data

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md
http://spark-packages.org/package/datastax/spark-cassandra-connector

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.