Hi Sindhuja,
Let me speak to your first error. Let's say your table is in a keyspace that is using a replication factor of 3. That means that when the data was written, Cassandra will have tried to write it to three servers in the cluster. Now, when you're querying the data, you're using a read consistency level of 1 (I know that because the error message cites "LOCAL_ONE"). To satisfy your query, Cassandra needs a response from just one of the three servers where the data exists to be able to consider this a good read. In this case, none of the three servers was able to respond.
I don't think the issue you're having is that your data is inconsistent between nodes. I think the problem you're having is that some (or all) of your Cassandra nodes are down or are being overloaded.
If the replication factor of your keyspace is actually 1, then this error is more likely to happen because there is only one copy of the data. So, if that's the case, you can help address this error by increasing the replication factor of your keyspace
to 2 or 3.
I'm not sure exactly what is going on with your second exception. It might be helpful if you included more of the error message.
Regarding your first message, can you include the following information:
1) What is the replication factor of your keyspace?
2) Can you show the output of a nodetool status?
Thanks,
Jim
2) Can you show the output of a nodetool status? - Attached the log file for your reference
Hi Sindhuja,
Let me speak to your first error. Let's say your table is in a keyspace that is using a replication factor of 3. That means that when the data was written, Cassandra will have tried to write it to three servers in the cluster. Now, when you're querying the data, you're using a read consistency level of 1 (I know that because the error message cites "LOCAL_ONE"). To satisfy your query, Cassandra needs a response from just one of the three servers where the data exists to be able to consider this a good read. In this case, none of the three servers was able to respond.
I don't think the issue you're having is that your data is inconsistent between nodes. I think the problem you're having is that some (or all) of your Cassandra nodes are down or are being overloaded.
If the replication factor of your keyspace is actually 1, then this error is more likely to happen because there is only one copy of the data. So, if that's the case, you can help address this error by increasing the replication factor of your keyspace to 2 or 3.
I'm not sure exactly what is going on with your second exception. It might be helpful if you included more of the error message.
Regarding your first message, can you include the following information:
1) What is the replication factor of your keyspace?2) Can you show the output of a nodetool status?
Thanks,
Jim
From: spark-connector-user@lists.datastax.com <spark-connector-user@lists.datastax.com> on behalf of Sindhuja Balaji <sindhuja....@gmail.com>
Sent: Tuesday, October 25, 2016 9:43 PM
I was getting an exception as below, So I changed the replication factor for the keyspace and ran the nodetool repair of keyspace. The repair has some error in system.log How do we fix the error or is the node repaired ?
Error 1:
--Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)Error 2 on Repair:com.google.common.util.concurrent.UncheckedExecutionException: org.apache.cassandra.exceptions.RepairException: [repair #ee4bc900-9b28-11e6-997b-e5311ab44422 on edw_data_import/tr_otp_topic_obj_assmnt, [(-5243341462767007283,-5181027973334775002], 5580759236169,-1908636647698583176], (-2288645573410771832,-2277883262246275075], (-664127392650676409,-636015452303436947], (3577877081120993756,3579814815688904558]]] Validation failed in
Any help for fixing Error 2?
Thanks,Sindhuja
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
Sindhuja,
You could try increasing replication factor to 2. That means that when you run a query with a read consistency level of 1 that the Spark Cassandra connector will have two chances to get the data before throwing that error. Keep in mind that you'll be doubling the size of the data in your cluster.
I was asking for the output of the nodetool status because I was trying to get an idea of how many servers were in your cluster and whether they were up or down. For instance, here is what I get when I run a nodetool status:
You can see that I have 7 nodes and that they are all "UN" (which means Up and Normal)
Regarding your logs, I see this error:
ERROR [ValidationExecutor:2] 2016-10-25 21:05:38,832 CompactionManager.java:1320 - Cannot start multiple repair sessions over the same sstables
You have some other errors too, but I suspect they're being caused by trying to run two repairs simultaneously.
BTW, a quick way to see all the errors in a log is to do a command like this: cat logfile | grep ERROR
Jim
Sindhuja,
You could try increasing replication factor to 2. That means that when you run a query with a read consistency level of 1 that the Spark Cassandra connector will have two chances to get the data before throwing that error. Keep in mind that you'll be doubling the size of the data in your cluster.
I was asking for the output of the nodetool status because I was trying to get an idea of how many servers were in your cluster and whether they were up or down. For instance, here is what I get when I run a nodetool status:
[jhatcher@someserver1 ~]$ nodetool status
Datacenter: DataCenter1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.0.0.7 486.55 GB 64 ? a2307045-71f9-4d8f-bb30-147687d95dc0 RAC1
UN 10.0.0.6 772.68 GB 64 ? e2b1a46a-b73a-465d-8b7a-3f2b8b3ac578 RAC1
UN 10.0.0.1 694.04 GB 64 ? fb0c4a02-43cb-4631-be47-e5184cf00c86 RAC1
UN 10.0.0.3 633.85 GB 64 ? e26cb2dc-c452-447e-b0c3-9de0f9b5e335 RAC1
UN 10.0.0.2 741.95 GB 64 ? 48483131-bbb8-49c2-8edd-60983985155d RAC1
UN 10.0.0.5 663.09 GB 64 ? d362278e-5f65-4428-beb6-779922f7e7f5 RAC1
UN 10.0.0.4 599.4 GB 64 ? 8f404aad-3a5d-4659-882f-239e331e071a RAC1
You can see that I have 7 nodes and that they are all "UN" (which means Up and Normal)
Regarding your logs, I see this error:
ERROR [ValidationExecutor:2] 2016-10-25 21:05:38,832 CompactionManager.java:1320 - Cannot start multiple repair sessions over the same sstables
You have some other errors too, but I suspect they're being caused by trying to run two repairs simultaneously.
BTW, a quick way to see all the errors in a log is to do a command like this: cat logfile | grep ERROR
Jim
From: spark-connector-user@lists.datastax.com <spark-connector-user@lists.datastax.com> on behalf of Sindhuja Balaji <sindhuja....@gmail.com>
Sent: Wednesday, October 26, 2016 7:55 AM
Subject: Re: replication factor error
Hi Jim,
1) What is the replication factor of your keyspace? - Currently replication factor is 1. Should I need to change to 3.
2) Can you show the output of a nodetool status? - Attached the log file for your reference
On Wed, Oct 26, 2016 at 7:50 AM, Jim Hatcher <james_...@hotmail.com> wrote:
Hi Sindhuja,
Let me speak to your first error. Let's say your table is in a keyspace that is using a replication factor of 3. That means that when the data was written, Cassandra will have tried to write it to three servers in the cluster. Now, when you're querying the data, you're using a read consistency level of 1 (I know that because the error message cites "LOCAL_ONE"). To satisfy your query, Cassandra needs a response from just one of the three servers where the data exists to be able to consider this a good read. In this case, none of the three servers was able to respond.
I don't think the issue you're having is that your data is inconsistent between nodes. I think the problem you're having is that some (or all) of your Cassandra nodes are down or are being overloaded.
If the replication factor of your keyspace is actually 1, then this error is more likely to happen because there is only one copy of the data. So, if that's the case, you can help address this error by increasing the replication factor of your keyspace to 2 or 3.
I'm not sure exactly what is going on with your second exception. It might be helpful if you included more of the error message.
Regarding your first message, can you include the following information:
1) What is the replication factor of your keyspace?2) Can you show the output of a nodetool status?
Thanks,
Jim
From: spark-conn...@lists.datastax.com <spark-conn...@lists.datastax.com> on behalf of Sindhuja Balaji <sindhuja....@gmail.com>
Sent: Tuesday, October 25, 2016 9:43 PM
I was getting an exception as below, So I changed the replication factor for the keyspace and ran the nodetool repair of keyspace. The repair has some error in system.log How do we fix the error or is the node repaired ?
Error 1:
--Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)Error 2 on Repair:com.google.common.util.concurrent.UncheckedExecutionException: org.apache.cassandra.exceptions.RepairException: [repair #ee4bc900-9b28-11e6-997b-e5311ab44422 on edw_data_import/tr_otp_topic_obj_assmnt, [(-5243341462767007283,-5181027973334775002], 5580759236169,-1908636647698583176], (-2288645573410771832,-2277883262246275075], (-664127392650676409,-636015452303436947], (3577877081120993756,3579814815688904558]]] Validation failed in
Any help for fixing Error 2?
Thanks,Sindhuja
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.
--
Thanks,Sindhuja
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
OK, so your nodes are all up (which is good!) and you have three nodes (which means you can at least go to a replication factor of 2). You could also go to a replication factor of 3, but that would mean if you lost a node that your cluster would be in trouble.
I found this article on increasing the replication factor:
https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html
And here is an article on monitoring the progress of a repair:http://stackoverflow.com/questions/25064717/how-do-i-know-if-nodetool-repair-is-finished
If going to a different replication factor doesn't help, you may have to look at adding more horsepower to your cluster, or throttling your Spark process.
Russell has a good video on doing tuning of the Spark Cassandra connector here:
https://www.youtube.com/watch?v=cKIHRD6kUOc&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=105
Jump to about minute 17.
.set("spark.cassandra.output.batch.size.rows", "5120")
WARN [SharedPool-Worker-4] 2016-09-29 10:45:07,294 BatchStatement.java:289 - Batch of prepared statements for [edw_data_import.tr_otp_topic_assmnt_201608] is of size 7618, exceeding specified threshold of 5120 by 2498.
WARN [SharedPool-Worker-6] 2016-09-29 10:45:07,323 BatchStatement.java:289 - Batch of prepared statements for [edw_data_import.tr_otp_topic_passmnt_tmpl_201608] is of size 11882, exceeding specified threshold of 5120 by 6762.
OK, so your nodes are all up (which is good!) and you have three nodes (which means you can at least go to a replication factor of 2). You could also go to a replication factor of 3, but that would mean if you lost a node that your cluster would be in trouble.
I found this article on increasing the replication factor:
https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html
And here is an article on monitoring the progress of a repair:http://stackoverflow.com/questions/25064717/how-do-i-know-if-nodetool-repair-is-finished
If going to a different replication factor doesn't help, you may have to look at adding more horsepower to your cluster, or throttling your Spark process.
Russell has a good video on doing tuning of the Spark Cassandra connector here:
https://www.youtube.com/watch?v=cKIHRD6kUOc&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=105
Jump to about minute 17.
Jim
From: spark-connector-user@lists.datastax.com <spark-connector-user@lists.datastax.com> on behalf of Sindhuja Balaji <sindhuja....@gmail.com>
Sent: Wednesday, October 26, 2016 8:38 AM
From: spark-conn...@lists.datastax.com <spark-conn...@lists.datastax.com> on behalf of Sindhuja Balaji <sindhuja....@gmail.com>
Sent: Wednesday, October 26, 2016 7:55 AM
--
Thanks,Sindhuja
Sindhuja,
Here is an article regarding that (with an answer by Russell Spitzer -- who you should always listen to!):
http://stackoverflow.com/questions/27039398/datastax-enterprise-spark-cassandra-batch-size
You might consider not setting spark.cassandra.output.batch.size.rows (which will tell the connector to look at the bytes setting instead) and then setting
spark.cassandra.output.batch.size.bytes to some larger value (like 256K maybe?)
Also, Russell mentions in his answer that you can adjust this setting in the cassandra.yaml:
batch_size_warn_threshold_in_kb
It's just a warning though. I don't think it means that your writes are failing.