I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
hmm, sounds like you should try batch size 1. If not try to nail down exactly what records are causing the failure
On Wed, Mar 8, 2017 at 6:42 PM Antonio Ye <anto...@gmail.com> wrote:
I have two spark jobs reading from two different sources inserting into the same table, let's call them job A and job B. Records inserted by job A take precedence over records inserted by job B. Job B is reading from Cassandra to determine duplicates and if found discards them and will not try to insert them. The problem is that during the read and dedup process, job A could of inserted new records and we do not want job B to overwrite them. That's why job B is using the "if not exists"
On Wed, Mar 8, 2017 at 6:34 PM Russell Spitzer <rus...@datastax.com> wrote:
Not sure I follow, can you give a more concrete example of what's happening and what you want to have happen?
On Wed, Mar 8, 2017 at 6:30 PM Antonio Ye <anto...@gmail.com> wrote:
The problem that I am trying to solve with the "if not exists" is that I have another spark streaming job that is also inserting to the same table and those records take precedence over the ones I am having problems with. Any ideas on how to get around that?
On Wed, Mar 8, 2017 at 6:19 PM Russell Spitzer <russell...@gmail.com> wrote:
yes, you are still writing batches, just appended with "IF NOT EXISTS" this probably is the cause of the weirdness. Batches do weird things if you have the same row multiple time inside them.
http://www.russellspitzer.com/2017/02/04/Ordering-in-Save-To_Cassandra/
For more information on that.
I think if you set your batch size to 1 you'll probably get the results you are expecting. But long run you should not use "IF NOT EXISTS" if you expect data within the same RDD to be in conflict. It's probably cheaper to dedupe in Spark then write.
On Wed, Mar 8, 2017 at 6:10 PM Antonio Ye <anto...@gmail.com> wrote:
I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
Just use batch size 1 and turn up the number of concurrent writes.
Better yet, add a clustering key for the data source. That way you can everything from the streaming source just have a higher priority and resolve on read. Totally removing the need for any Paxos in the first place.
If you really want to debug it you'll need to know exactly what is being run. And when, turn on debugging and maybe just set trace probability to 1 and analyze the missing key writes
Yeah, batch size of 1 is an option except that it makes the inserts a bit slow. The interesting part is that its always 33 records that do not get inserted when using the default batch size. They are not the same 33 records that do not get inserted but always 33. Similarly if I set the batch size to 100 rows I always get 8578 and lastly if I set the batch size to something large like 1000 then all 8675 records get inserted. Any ideas on how to go about debugging this?
On Wed, Mar 8, 2017 at 7:47 PM, Russell Spitzer <rus...@datastax.com> wrote:
hmm, sounds like you should try batch size 1. If not try to nail down exactly what records are causing the failure
On Wed, Mar 8, 2017 at 6:42 PM Antonio Ye <anto...@gmail.com> wrote:
I have two spark jobs reading from two different sources inserting into the same table, let's call them job A and job B. Records inserted by job A take precedence over records inserted by job B. Job B is reading from Cassandra to determine duplicates and if found discards them and will not try to insert them. The problem is that during the read and dedup process, job A could of inserted new records and we do not want job B to overwrite them. That's why job B is using the "if not exists"
On Wed, Mar 8, 2017 at 6:34 PM Russell Spitzer <rus...@datastax.com> wrote:
Not sure I follow, can you give a more concrete example of what's happening and what you want to have happen?
On Wed, Mar 8, 2017 at 6:30 PM Antonio Ye <anto...@gmail.com> wrote:
The problem that I am trying to solve with the "if not exists" is that I have another spark streaming job that is also inserting to the same table and those records take precedence over the ones I am having problems with. Any ideas on how to get around that?
On Wed, Mar 8, 2017 at 6:19 PM Russell Spitzer <russell...@gmail.com> wrote:
yes, you are still writing batches, just appended with "IF NOT EXISTS" this probably is the cause of the weirdness. Batches do weird things if you have the same row multiple time inside them.
http://www.russellspitzer.com/2017/02/04/Ordering-in-Save-To_Cassandra/
For more information on that.
I think if you set your batch size to 1 you'll probably get the results you are expecting. But long run you should not use "IF NOT EXISTS" if you expect data within the same RDD to be in conflict. It's probably cheaper to dedupe in Spark then write.
On Wed, Mar 8, 2017 at 6:10 PM Antonio Ye <anto...@gmail.com> wrote:
I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
Just use batch size 1 and turn up the number of concurrent writes.
Better yet, add a clustering key for the data source. That way you can everything from the streaming source just have a higher priority and resolve on read. Totally removing the need for any Paxos in the first place.
If you really want to debug it you'll need to know exactly what is being run. And when, turn on debugging and maybe just set trace probability to 1 and analyze the missing key writes
On Wed, Mar 8, 2017, 8:42 PM Antonio Ye <anto...@gmail.com> wrote:
Yeah, batch size of 1 is an option except that it makes the inserts a bit slow. The interesting part is that its always 33 records that do not get inserted when using the default batch size. They are not the same 33 records that do not get inserted but always 33. Similarly if I set the batch size to 100 rows I always get 8578 and lastly if I set the batch size to something large like 1000 then all 8675 records get inserted. Any ideas on how to go about debugging this?
On Wed, Mar 8, 2017 at 7:47 PM, Russell Spitzer <rus...@datastax.com> wrote:
hmm, sounds like you should try batch size 1. If not try to nail down exactly what records are causing the failure
On Wed, Mar 8, 2017 at 6:42 PM Antonio Ye <anto...@gmail.com> wrote:
I have two spark jobs reading from two different sources inserting into the same table, let's call them job A and job B. Records inserted by job A take precedence over records inserted by job B. Job B is reading from Cassandra to determine duplicates and if found discards them and will not try to insert them. The problem is that during the read and dedup process, job A could of inserted new records and we do not want job B to overwrite them. That's why job B is using the "if not exists"
On Wed, Mar 8, 2017 at 6:34 PM Russell Spitzer <rus...@datastax.com> wrote:
Not sure I follow, can you give a more concrete example of what's happening and what you want to have happen?
On Wed, Mar 8, 2017 at 6:30 PM Antonio Ye <anto...@gmail.com> wrote:
The problem that I am trying to solve with the "if not exists" is that I have another spark streaming job that is also inserting to the same table and those records take precedence over the ones I am having problems with. Any ideas on how to get around that?
On Wed, Mar 8, 2017 at 6:19 PM Russell Spitzer <russell...@gmail.com> wrote:
yes, you are still writing batches, just appended with "IF NOT EXISTS" this probably is the cause of the weirdness. Batches do weird things if you have the same row multiple time inside them.
http://www.russellspitzer.com/2017/02/04/Ordering-in-Save-To_Cassandra/
For more information on that.
I think if you set your batch size to 1 you'll probably get the results you are expecting. But long run you should not use "IF NOT EXISTS" if you expect data within the same RDD to be in conflict. It's probably cheaper to dedupe in Spark then write.
On Wed, Mar 8, 2017 at 6:10 PM Antonio Ye <anto...@gmail.com> wrote:
I have an problem when I try to insert rows into Cassandra using saveToCassandra and specifying writeConf = WriteConf(ifNotExists = true). I have an RDD for a single partition key. When I call saveToCassandra I end up with 8675 records Cassandra which is correct number of unique rows based on my primary key. When I specify writeConf = WriteConf(ifNotExists = true) I end up with 8642 records when I call saveToCassandra the first time. If I call saveToCassandra a second time I end up with the correct number of rows in Cassandra. I tried changing the batchSize and when I set it to 100 I end up with 8578 rows. Anyone have an idea of what is going on here?
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.