Problem with saveToCassandra when specifying writeConf = WriteConf(ifNotExists = true)

Antonio Ye

unread,

Mar 8, 2017, 9:10:36 PM3/8/17

to DataStax Spark Connector for Apache Cassandra

I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?

Russell Spitzer

unread,

Mar 8, 2017, 9:19:12 PM3/8/17

to DataStax Spark Connector for Apache Cassandra

yes, you are still writing batches, just appended with "IF NOT EXISTS" this probably is the cause of the weirdness. Batches do weird things if you have the same row multiple time inside them.

http://www.russellspitzer.com/2017/02/04/Ordering-in-Save-To_Cassandra/

For more information on that.

I think if you set your batch size to 1 you'll probably get the results you are expecting. But long run you should not use "IF NOT EXISTS" if you expect data within the same RDD to be in conflict. It's probably cheaper to dedupe in Spark then write.

On Wed, Mar 8, 2017 at 6:10 PM Antonio Ye <anto...@gmail.com> wrote:

I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Antonio Ye

unread,

Mar 8, 2017, 9:30:32 PM3/8/17

to DataStax Spark Connector for Apache Cassandra

The problem that I am trying to solve with the "if not exists" is that I have another spark streaming job that is also inserting to the same table and those records take precedence over the ones I am having problems with. Any ideas on how to get around that?

Russell Spitzer

unread,

Mar 8, 2017, 9:34:08 PM3/8/17

to DataStax Spark Connector for Apache Cassandra

Not sure I follow, can you give a more concrete example of what's happening and what you want to have happen?

Russell Spitzer
Software Engineer

Antonio Ye

unread,

Mar 8, 2017, 9:42:29 PM3/8/17

to DataStax Spark Connector for Apache Cassandra

I have two spark jobs reading from two different sources inserting into the same table, let's call them job A and job B. Records inserted by job A take precedence over records inserted by job B. Job B is reading from Cassandra to determine duplicates and if found discards them and will not try to insert them. The problem is that during the read and dedup process, job A could of inserted new records and we do not want job B to overwrite them. That's why job B is using the "if not exists"

Russell Spitzer

unread,

Mar 8, 2017, 10:47:17 PM3/8/17

to spark-conn...@lists.datastax.com

hmm, sounds like you should try batch size 1. If not try to nail down exactly what records are causing the failure

Antonio Ye

unread,

Mar 8, 2017, 11:42:25 PM3/8/17

to DataStax Spark Connector for Apache Cassandra

Yeah, batch size of 1 is an option except that it makes the inserts a bit slow. The interesting part is that its always 33 records that do not get inserted when using the default batch size. They are not the same 33 records that do not get inserted but always 33. Similarly if I set the batch size to 100 rows I always get 8578 and lastly if I set the batch size to something large like 1000 then all 8675 records get inserted. Any ideas on how to go about debugging this?

On Wed, Mar 8, 2017 at 7:47 PM, Russell Spitzer <rus...@datastax.com> wrote:

hmm, sounds like you should try batch size 1. If not try to nail down exactly what records are causing the failure

On Wed, Mar 8, 2017 at 6:42 PM Antonio Ye <anto...@gmail.com> wrote:

I have two spark jobs reading from two different sources inserting into the same table, let's call them job A and job B. Records inserted by job A take precedence over records inserted by job B. Job B is reading from Cassandra to determine duplicates and if found discards them and will not try to insert them. The problem is that during the read and dedup process, job A could of inserted new records and we do not want job B to overwrite them. That's why job B is using the "if not exists"

On Wed, Mar 8, 2017 at 6:34 PM Russell Spitzer <rus...@datastax.com> wrote:

Not sure I follow, can you give a more concrete example of what's happening and what you want to have happen?

On Wed, Mar 8, 2017 at 6:30 PM Antonio Ye <anto...@gmail.com> wrote:

The problem that I am trying to solve with the "if not exists" is that I have another spark streaming job that is also inserting to the same table and those records take precedence over the ones I am having problems with. Any ideas on how to get around that?

On Wed, Mar 8, 2017 at 6:19 PM Russell Spitzer <russell...@gmail.com> wrote:

yes, you are still writing batches, just appended with "IF NOT EXISTS" this probably is the cause of the weirdness. Batches do weird things if you have the same row multiple time inside them.

http://www.russellspitzer.com/2017/02/04/Ordering-in-Save-To_Cassandra/

For more information on that.

I think if you set your batch size to 1 you'll probably get the results you are expecting. But long run you should not use "IF NOT EXISTS" if you expect data within the same RDD to be in conflict. It's probably cheaper to dedupe in Spark then write.

On Wed, Mar 8, 2017 at 6:10 PM Antonio Ye <anto...@gmail.com> wrote:

I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

Russell Spitzer

unread,

Mar 9, 2017, 1:52:47 AM3/9/17

to DataStax Spark Connector for Apache Cassandra

Just use batch size 1 and turn up the number of concurrent writes.

Better yet, add a clustering key for the data source. That way you can everything from the streaming source just have a higher priority and resolve on read. Totally removing the need for any Paxos in the first place.

If you really want to debug it you'll need to know exactly what is being run. And when, turn on debugging and maybe just set trace probability to 1 and analyze the missing key writes

On Wed, Mar 8, 2017, 8:42 PM Antonio Ye <anto...@gmail.com> wrote:

Yeah, batch size of 1 is an option except that it makes the inserts a bit slow. The interesting part is that its always 33 records that do not get inserted when using the default batch size. They are not the same 33 records that do not get inserted but always 33. Similarly if I set the batch size to 100 rows I always get 8578 and lastly if I set the batch size to something large like 1000 then all 8675 records get inserted. Any ideas on how to go about debugging this?

On Wed, Mar 8, 2017 at 7:47 PM, Russell Spitzer <rus...@datastax.com> wrote:

hmm, sounds like you should try batch size 1. If not try to nail down exactly what records are causing the failure

On Wed, Mar 8, 2017 at 6:42 PM Antonio Ye <anto...@gmail.com> wrote:

I have two spark jobs reading from two different sources inserting into the same table, let's call them job A and job B. Records inserted by job A take precedence over records inserted by job B. Job B is reading from Cassandra to determine duplicates and if found discards them and will not try to insert them. The problem is that during the read and dedup process, job A could of inserted new records and we do not want job B to overwrite them. That's why job B is using the "if not exists"

On Wed, Mar 8, 2017 at 6:34 PM Russell Spitzer <rus...@datastax.com> wrote:

Not sure I follow, can you give a more concrete example of what's happening and what you want to have happen?

On Wed, Mar 8, 2017 at 6:30 PM Antonio Ye <anto...@gmail.com> wrote:

The problem that I am trying to solve with the "if not exists" is that I have another spark streaming job that is also inserting to the same table and those records take precedence over the ones I am having problems with. Any ideas on how to get around that?

On Wed, Mar 8, 2017 at 6:19 PM Russell Spitzer <russell...@gmail.com> wrote:

yes, you are still writing batches, just appended with "IF NOT EXISTS" this probably is the cause of the weirdness. Batches do weird things if you have the same row multiple time inside them.

http://www.russellspitzer.com/2017/02/04/Ordering-in-Save-To_Cassandra/

For more information on that.

I think if you set your batch size to 1 you'll probably get the results you are expecting. But long run you should not use "IF NOT EXISTS" if you expect data within the same RDD to be in conflict. It's probably cheaper to dedupe in Spark then write.

On Wed, Mar 8, 2017 at 6:10 PM Antonio Ye <anto...@gmail.com> wrote:

I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Antonio Ye

unread,

Mar 9, 2017, 1:56:43 PM3/9/17

to DataStax Spark Connector for Apache Cassandra

So I spent some time this morning looking at system_traces.events and see all the "Accepting proposal Commit", "Committing proposal Commit", and "Rejecting proposal for Commit" messages for the records that do get inserted into Cassandra but see no "CAS precondition does not match current values" for any of the records that are not getting inserted. Does that mean that the Cassandra connector is not sending those insert requests and that the issue is in the driver or spark connector?

On Wed, Mar 8, 2017 at 10:52 PM, Russell Spitzer <rus...@datastax.com> wrote:

Just use batch size 1 and turn up the number of concurrent writes.

Better yet, add a clustering key for the data source. That way you can everything from the streaming source just have a higher priority and resolve on read. Totally removing the need for any Paxos in the first place.

If you really want to debug it you'll need to know exactly what is being run. And when, turn on debugging and maybe just set trace probability to 1 and analyze the missing key writes

On Wed, Mar 8, 2017, 8:42 PM Antonio Ye <anto...@gmail.com> wrote:

Yeah, batch size of 1 is an option except that it makes the inserts a bit slow. The interesting part is that its always 33 records that do not get inserted when using the default batch size. They are not the same 33 records that do not get inserted but always 33. Similarly if I set the batch size to 100 rows I always get 8578 and lastly if I set the batch size to something large like 1000 then all 8675 records get inserted. Any ideas on how to go about debugging this?

On Wed, Mar 8, 2017 at 7:47 PM, Russell Spitzer <rus...@datastax.com> wrote:

hmm, sounds like you should try batch size 1. If not try to nail down exactly what records are causing the failure

On Wed, Mar 8, 2017 at 6:42 PM Antonio Ye <anto...@gmail.com> wrote:

I have two spark jobs reading from two different sources inserting into the same table, let's call them job A and job B. Records inserted by job A take precedence over records inserted by job B. Job B is reading from Cassandra to determine duplicates and if found discards them and will not try to insert them. The problem is that during the read and dedup process, job A could of inserted new records and we do not want job B to overwrite them. That's why job B is using the "if not exists"

On Wed, Mar 8, 2017 at 6:34 PM Russell Spitzer <rus...@datastax.com> wrote:

Not sure I follow, can you give a more concrete example of what's happening and what you want to have happen?

On Wed, Mar 8, 2017 at 6:30 PM Antonio Ye <anto...@gmail.com> wrote:

The problem that I am trying to solve with the "if not exists" is that I have another spark streaming job that is also inserting to the same table and those records take precedence over the ones I am having problems with. Any ideas on how to get around that?

On Wed, Mar 8, 2017 at 6:19 PM Russell Spitzer <russell...@gmail.com> wrote:

yes, you are still writing batches, just appended with "IF NOT EXISTS" this probably is the cause of the weirdness. Batches do weird things if you have the same row multiple time inside them.

http://www.russellspitzer.com/2017/02/04/Ordering-in-Save-To_Cassandra/

For more information on that.

I think if you set your batch size to 1 you'll probably get the results you are expecting. But long run you should not use "IF NOT EXISTS" if you expect data within the same RDD to be in conflict. It's probably cheaper to dedupe in Spark then write.

On Wed, Mar 8, 2017 at 6:10 PM Antonio Ye <anto...@gmail.com> wrote:

I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

Russell Spitzer

unread,

Mar 9, 2017, 2:39:12 PM3/9/17

to DataStax Spark Connector for Apache Cassandra

It may be that the driver isn't handling batches with IF NOT EXISTS correctly? I'm really not sure. The connector is just adding "IF NOT EXISTS" to the CQL. Then prepared statements with that condition are built into a batch. So if that is malfunctioning it would be an error with the driver I think

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Antonio Ye

unread,

Mar 9, 2017, 11:50:32 PM3/9/17

to DataStax Spark Connector for Apache Cassandra

I posted the issue in the Cassandra driver forum with a sample Spark code that shows the problem to see if anyone can help.

https://github.com/tonyye-dexcom/scala-projects/tree/master/TestCassandraConnector

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

Antonio Ye

unread,

Mar 12, 2017, 7:07:30 PM3/12/17

to DataStax Spark Connector for Apache Cassandra

Looks like the issue is that one or more of the batches contain duplicate row and when that happens the whole batch fails and rows in that batch are not inserted. Is there any way to tell the connector or driver to retry the batch that failed but retry with a batch size of 1? If, I wanted to make this change where would I do this?

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

Russell Spitzer

unread,

Mar 13, 2017, 5:08:40 PM3/13/17

to DataStax Spark Connector for Apache Cassandra

I don't think so, you'd have to make a extra custom retry policy. Be much easier to just always have a batch size of 1 and up your concurrent writers :)

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Antonio Ye

unread,

Mar 13, 2017, 5:22:39 PM3/13/17

to DataStax Spark Connector for Apache Cassandra

I am looking for alternatives to setting the batch size of 1 because I am seeing a significant degradation in performance running locally with batch size of 1. I believe the default concurrent level is 5 and if I bump to 10 I get the following error:

17/03/13 14:20:38 ERROR QueryExecutor: Failed to execute: com.datastax.spark.connector.writer.RichBoundStatement@53b3dcf

com.datastax.driver.core.exceptions.InvalidQueryException: SERIAL is not supported as conditional update commit consistency. Use ANY if you mean "make sure it is accepted but I don't care how many replicas commit it for non-SERIAL reads"

at com.datastax.driver.core.Responses$Error.asException(Responses.java:136)

Here is the call to saveToCassandra:

rdd2.saveToCassandra(TestKeyspace, TestTable, writeConf = WriteConf(parallelismLevel = 10, batchSize = RowsInBatch(1), ifNotExists = true))

Thoughts?

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

Russell Spitzer

unread,

Mar 13, 2017, 5:31:37 PM3/13/17

to DataStax Spark Connector for Apache Cassandra

Well like I said the best alternative is to just not use "IF NOT EXISTS" at all. Have each source tag each row it inserts, then on read resolve and only use the value from the priority source. This will be at least several fold more efficient.

As for the exception that looks like an error. I couldn't tell you why that is though since the statement should be the same as when you have write parallelism at 5.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Reply all

Reply to author

Forward