Hi,
I'm trying to insert a large dataset (about 1 billion rows) to cassandra cluster with Spark.
I set the partition to 40, so each task has 25Milion rows to insert and each task took about 20hours to complete insert.
During insert, I encounter the following exception for some tasks;
java.io.IOException: Failed to write statements to *********. The
latest exception was
Not enough replicas available for query at consistency ALL (3 required but only 2 alive)
And found that failed task starts to insert from the very beginning and it took another 20Hours to complete.
Is there any way to just retry for failed rows instead of restarting the task?