replication factor error

471 views
Skip to first unread message

Sindhuja Balaji

unread,
Oct 25, 2016, 11:44:00 PM10/25/16
to spark-conn...@lists.datastax.com
I was getting an exception as below, So I changed the replication factor for the keyspace and ran the nodetool repair of keyspace. The repair has some error in system.log How do we fix the error or is the node repaired ?

​Error 1:​

​Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)
​Error 2 on Repair:
com.google.common.util.concurrent.UncheckedExecutionException: org.apache.cassandra.exceptions.RepairException: [repair #ee4bc900-9b28-11e6-997b-e5311ab44422 on edw_data_import/tr_otp_topic_obj_assmnt, [(-5243341462767007283,-5181027973334775002], 5580759236169,-1908636647698583176], (-2288645573410771832,-2277883262246275075], (-664127392650676409,-636015452303436947], (3577877081120993756,3579814815688904558]]] Validation failed in 

Any help for fixing Error 2?​

--
Thanks,
Sindhuja

Jim Hatcher

unread,
Oct 26, 2016, 9:50:55 AM10/26/16
to spark-conn...@lists.datastax.com

Hi Sindhuja,


Let me speak to your first error.  Let's say your table is in a keyspace that is using a replication factor of 3.  That means that when the data was written, Cassandra will have tried to write it to three servers in the cluster.  Now, when you're querying the data, you're using a read consistency level of 1 (I know that because the error message cites "LOCAL_ONE").  To satisfy your query, Cassandra needs a response from just one of the three servers where the data exists to be able to consider this a good read.  In this case, none of the three servers was able to respond.


I don't think the issue you're having is that your data is inconsistent between nodes.  I think the problem you're having is that some (or all) of your Cassandra nodes are down or are being overloaded.


If the replication factor of your keyspace is actually 1, then this error is more likely to happen because there is only one copy of the data.  So, if that's the case, you can help address this error by increasing the replication factor of your keyspace to 2 or 3.


I'm not sure exactly what is going on with your second exception.  It might be helpful if you included more of the error message.


Regarding your first message, can you include the following information:

1) What is the replication factor of your keyspace?

2) Can you show the output of a nodetool status?


Thanks,

Jim




From: spark-conn...@lists.datastax.com <spark-conn...@lists.datastax.com> on behalf of Sindhuja Balaji <sindhuja....@gmail.com>
Sent: Tuesday, October 25, 2016 9:43 PM
To: spark-conn...@lists.datastax.com
Subject: replication factor error
 
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Sindhuja Balaji

unread,
Oct 26, 2016, 9:55:44 AM10/26/16
to spark-conn...@lists.datastax.com
Hi Jim,

1) What is the replication factor of your keyspace?  - Currently replication factor is 1. Should I need to change to 3.

2) Can you show the output of a nodetool status?  - Attached the log file for your reference





On Wed, Oct 26, 2016 at 7:50 AM, Jim Hatcher <james_...@hotmail.com> wrote:

Hi Sindhuja,


Let me speak to your first error.  Let's say your table is in a keyspace that is using a replication factor of 3.  That means that when the data was written, Cassandra will have tried to write it to three servers in the cluster.  Now, when you're querying the data, you're using a read consistency level of 1 (I know that because the error message cites "LOCAL_ONE").  To satisfy your query, Cassandra needs a response from just one of the three servers where the data exists to be able to consider this a good read.  In this case, none of the three servers was able to respond.


I don't think the issue you're having is that your data is inconsistent between nodes.  I think the problem you're having is that some (or all) of your Cassandra nodes are down or are being overloaded.


If the replication factor of your keyspace is actually 1, then this error is more likely to happen because there is only one copy of the data.  So, if that's the case, you can help address this error by increasing the replication factor of your keyspace to 2 or 3.


I'm not sure exactly what is going on with your second exception.  It might be helpful if you included more of the error message.


Regarding your first message, can you include the following information:

​​
1) What is the replication factor of your keyspace?

2) Can you show the output of a nodetool status?


Thanks,

Jim




Sent: Tuesday, October 25, 2016 9:43 PM
To: spark-connector-user@lists.datastax.com
Subject: replication factor error
 
I was getting an exception as below, So I changed the replication factor for the keyspace and ran the nodetool repair of keyspace. The repair has some error in system.log How do we fix the error or is the node repaired ?

Error 1:

Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)
Error 2 on Repair:
com.google.common.util.concurrent.UncheckedExecutionException: org.apache.cassandra.exceptions.RepairException: [repair #ee4bc900-9b28-11e6-997b-e5311ab44422 on edw_data_import/tr_otp_topic_obj_assmnt, [(-5243341462767007283,-5181027973334775002], 5580759236169,-1908636647698583176], (-2288645573410771832,-2277883262246275075], (-664127392650676409,-636015452303436947], (3577877081120993756,3579814815688904558]]] Validation failed in 

Any help for fixing Error 2?

--
Thanks,
Sindhuja

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.



--
Thanks,
Sindhuja
log.txt

Jim Hatcher

unread,
Oct 26, 2016, 10:22:50 AM10/26/16
to spark-conn...@lists.datastax.com

Sindhuja,


You could try increasing replication factor to 2.  That means that when you run a query with a read consistency level of 1 that the Spark Cassandra connector will have two chances to get the data before throwing that error.  Keep in mind that you'll be doubling the size of the data in your cluster.


I was asking for the output of the nodetool status because I was trying to get an idea of how many servers were in your cluster and whether they were up or down.  For instance, here is what I get when  I run a nodetool status:

[jhatcher@someserver1 ~]$ nodetool status
Datacenter: DataCenter1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns    Host ID                               Rack
UN  10.0.0.7  486.55 GB  64      ?       a2307045-71f9-4d8f-bb30-147687d95dc0  RAC1
UN  10.0.0.6  772.68 GB  64      ?       e2b1a46a-b73a-465d-8b7a-3f2b8b3ac578  RAC1
UN  10.0.0.1  694.04 GB  64      ?       fb0c4a02-43cb-4631-be47-e5184cf00c86  RAC1
UN  10.0.0.3  633.85 GB  64      ?       e26cb2dc-c452-447e-b0c3-9de0f9b5e335  RAC1
UN  10.0.0.2  741.95 GB  64      ?       48483131-bbb8-49c2-8edd-60983985155d  RAC1
UN  10.0.0.5  663.09 GB  64      ?       d362278e-5f65-4428-beb6-779922f7e7f5  RAC1
UN  10.0.0.4  599.4 GB   64      ?       8f404aad-3a5d-4659-882f-239e331e071a  RAC1

You can see that I have 7 nodes and that they are all "UN" (which means Up and Normal)


Regarding your logs, I see this error:

ERROR [ValidationExecutor:2] 2016-10-25 21:05:38,832 CompactionManager.java:1320 - Cannot start multiple repair sessions over the same sstables


You have some other errors too, but I suspect they're being caused by trying to run two repairs simultaneously.


BTW, a quick way to see all the errors in a log is to do a command like this: cat logfile | grep ERROR


Jim




From: spark-conn...@lists.datastax.com <spark-conn...@lists.datastax.com> on behalf of Sindhuja Balaji <sindhuja....@gmail.com>
Sent: Wednesday, October 26, 2016 7:55 AM
To: spark-conn...@lists.datastax.com
Subject: Re: replication factor error
 
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Sindhuja Balaji

unread,
Oct 26, 2016, 10:38:18 AM10/26/16
to spark-conn...@lists.datastax.com
sindhuja.dhamodaran@cassandra104-01 ~ $ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  10.20.20.165  2 GB       256          ?       88e0164d-7834-4c77-9725-7df831568298  rack1
UN  10.20.20.166  2 GB       256          ?       715bc107-13c0-4aec-ad00-bc8b16a347d2  rack1
UN  10.20.20.58   2 GB       256          ?       0ced5d7b-1dc1-46ec-91a1-4aab7e36c042  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless.

This is what I see in the status.

On Wed, Oct 26, 2016 at 8:22 AM, Jim Hatcher <james_...@hotmail.com> wrote:

Sindhuja,


You could try increasing replication factor to 2.  That means that when you run a query with a read consistency level of 1 that the Spark Cassandra connector will have two chances to get the data before throwing that error.  Keep in mind that you'll be doubling the size of the data in your cluster.


I was asking for the output of the nodetool status because I was trying to get an idea of how many servers were in your cluster and whether they were up or down.  For instance, here is what I get when  I run a nodetool status:

[jhatcher@someserver1 ~]$ nodetool status
Datacenter: DataCenter1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns    Host ID                               Rack
UN  10.0.0.7  486.55 GB  64      ?       a2307045-71f9-4d8f-bb30-147687d95dc0  RAC1
UN  10.0.0.6  772.68 GB  64      ?       e2b1a46a-b73a-465d-8b7a-3f2b8b3ac578  RAC1
UN  10.0.0.1  694.04 GB  64      ?       fb0c4a02-43cb-4631-be47-e5184cf00c86  RAC1
UN  10.0.0.3  633.85 GB  64      ?       e26cb2dc-c452-447e-b0c3-9de0f9b5e335  RAC1
UN  10.0.0.2  741.95 GB  64      ?       48483131-bbb8-49c2-8edd-60983985155d  RAC1
UN  10.0.0.5  663.09 GB  64      ?       d362278e-5f65-4428-beb6-779922f7e7f5  RAC1
UN  10.0.0.4  599.4 GB   64      ?       8f404aad-3a5d-4659-882f-239e331e071a  RAC1

You can see that I have 7 nodes and that they are all "UN" (which means Up and Normal)


Regarding your logs, I see this error:

ERROR [ValidationExecutor:2] 2016-10-25 21:05:38,832 CompactionManager.java:1320 - Cannot start multiple repair sessions over the same sstables


You have some other errors too, but I suspect they're being caused by trying to run two repairs simultaneously.


BTW, a quick way to see all the errors in a log is to do a command like this: cat logfile | grep ERROR


Jim



Sent: Wednesday, October 26, 2016 7:55 AM

Subject: Re: replication factor error
Hi Jim,

1) What is the replication factor of your keyspace?  - Currently replication factor is 1. Should I need to change to 3.

2) Can you show the output of a nodetool status?  - Attached the log file for your reference




On Wed, Oct 26, 2016 at 7:50 AM, Jim Hatcher <james_...@hotmail.com> wrote:

Hi Sindhuja,


Let me speak to your first error.  Let's say your table is in a keyspace that is using a replication factor of 3.  That means that when the data was written, Cassandra will have tried to write it to three servers in the cluster.  Now, when you're querying the data, you're using a read consistency level of 1 (I know that because the error message cites "LOCAL_ONE").  To satisfy your query, Cassandra needs a response from just one of the three servers where the data exists to be able to consider this a good read.  In this case, none of the three servers was able to respond.


I don't think the issue you're having is that your data is inconsistent between nodes.  I think the problem you're having is that some (or all) of your Cassandra nodes are down or are being overloaded.


If the replication factor of your keyspace is actually 1, then this error is more likely to happen because there is only one copy of the data.  So, if that's the case, you can help address this error by increasing the replication factor of your keyspace to 2 or 3.


I'm not sure exactly what is going on with your second exception.  It might be helpful if you included more of the error message.


Regarding your first message, can you include the following information:

1) What is the replication factor of your keyspace?

2) Can you show the output of a nodetool status?


Thanks,

Jim



Sent: Tuesday, October 25, 2016 9:43 PM
To: spark-conn...@lists.datastax.com
Subject: replication factor error
 
I was getting an exception as below, So I changed the replication factor for the keyspace and ran the nodetool repair of keyspace. The repair has some error in system.log How do we fix the error or is the node repaired ?

Error 1:

Caused by: com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)
Error 2 on Repair:
com.google.common.util.concurrent.UncheckedExecutionException: org.apache.cassandra.exceptions.RepairException: [repair #ee4bc900-9b28-11e6-997b-e5311ab44422 on edw_data_import/tr_otp_topic_obj_assmnt, [(-5243341462767007283,-5181027973334775002], 5580759236169,-1908636647698583176], (-2288645573410771832,-2277883262246275075], (-664127392650676409,-636015452303436947], (3577877081120993756,3579814815688904558]]] Validation failed in 

Any help for fixing Error 2?

--
Thanks,
Sindhuja

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsubscrib...@lists.datastax.com.



--
Thanks,
Sindhuja

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.



--
Thanks,
Sindhuja

Jim Hatcher

unread,
Oct 26, 2016, 10:49:57 AM10/26/16
to spark-conn...@lists.datastax.com

OK, so your nodes are all up (which is good!) and you have three nodes (which means you can at least go to a replication factor of 2).  You could also go to a replication factor of 3, but that would mean if you lost a node that your cluster would be in trouble.


I found this article on increasing the replication factor:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html

Increasing the replication factor increases the total number of copies of keyspace data stored in a Cassandra cluster. Increasing the replication factor increases ...
And here is an article on monitoring the progress of a repair:

http://stackoverflow.com/questions/25064717/how-do-i-know-if-nodetool-repair-is-finished

@Aaron Okay, what if nodetool netstats tells you that everything is done and nodetool repair does not return? Would it then be safe to use Ctrl-C on that run?

If going to a different replication factor doesn't help, you may have to look at adding more horsepower to your cluster, or throttling your Spark process.

Russell has a good video on doing tuning of the Spark Cassandra connector here:

https://www.youtube.com/watch?v=cKIHRD6kUOc&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=105


Jump to about minute 17.


Jim


Sent: Wednesday, October 26, 2016 8:38 AM
To: spark-conn...@lists.datastax.com
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Sindhuja Balaji

unread,
Oct 26, 2016, 10:45:15 PM10/26/16
to spark-conn...@lists.datastax.com
Thank you Jim, That really helped me in getting some good idea.

I changed the replication factor to 3 and I am seeing warning message. What would be the best practice to resolve the same. Do we need to set the below to a higher value

        .set("spark.cassandra.output.batch.size.rows", "5120")

WARN  [SharedPool-Worker-4] 2016-09-29 10:45:07,294 BatchStatement.java:289 - Batch of prepared statements for [edw_data_import.tr_otp_topic_assmnt_201608] is of size 7618, exceeding specified threshold of 5120 by 2498.

WARN  [SharedPool-Worker-6] 2016-09-29 10:45:07,323 BatchStatement.java:289 - Batch of prepared statements for [edw_data_import.tr_otp_topic_passmnt_tmpl_201608] is of size 11882, exceeding specified threshold of 5120 by 6762.


On Wed, Oct 26, 2016 at 8:49 AM, Jim Hatcher <james_...@hotmail.com> wrote:

OK, so your nodes are all up (which is good!) and you have three nodes (which means you can at least go to a replication factor of 2).  You could also go to a replication factor of 3, but that would mean if you lost a node that your cluster would be in trouble.


I found this article on increasing the replication factor:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/update_ks_rf_t.html

Increasing the replication factor increases the total number of copies of keyspace data stored in a Cassandra cluster. Increasing the replication factor increases ...
And here is an article on monitoring the progress of a repair:

http://stackoverflow.com/questions/25064717/how-do-i-know-if-nodetool-repair-is-finished

@Aaron Okay, what if nodetool netstats tells you that everything is done and nodetool repair does not return? Would it then be safe to use Ctrl-C on that run?

If going to a different replication factor doesn't help, you may have to look at adding more horsepower to your cluster, or throttling your Spark process.

Russell has a good video on doing tuning of the Spark Cassandra connector here:

https://www.youtube.com/watch?v=cKIHRD6kUOc&list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&index=105


Jump to about minute 17.


Jim

Sent: Wednesday, October 26, 2016 8:38 AM

To: spark-connector-user@lists.datastax.com
Subject: Re: replication factor error
Sent: Wednesday, October 26, 2016 7:55 AM

To: spark-conn...@lists.datastax.com
Subject: Re: replication factor error



--
Thanks,
Sindhuja



--
Thanks,
Sindhuja

Jim Hatcher

unread,
Oct 27, 2016, 10:01:22 AM10/27/16
to spark-conn...@lists.datastax.com

Sindhuja,


Here is an article regarding that (with an answer by Russell Spitzer -- who you should always listen to!):

http://stackoverflow.com/questions/27039398/datastax-enterprise-spark-cassandra-batch-size


I think the idea is that you either need to set spark.cassandra.output.batch.size.rows or set spark.cassandra.output.batch.size.bytes.


You might consider not setting spark.cassandra.output.batch.size.rows (which will tell the connector to look at the bytes setting instead) and then setting spark.cassandra.output.batch.size.bytes to some larger value (like 256K maybe?)


Also, Russell mentions in his answer that you can adjust this setting in the cassandra.yaml: batch_size_warn_threshold_in_kb


It's just a warning though.  I don't think it means that your writes are failing.


Jim


Sent: Wednesday, October 26, 2016 8:45 PM
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
Reply all
Reply to author
Forward
0 new messages