Is there way yo retrying only failed rows?

10 views

Skip to first unread message

unread,

Oct 19, 2020, 1:45:06 AM10/19/20

to DataStax Spark Connector for Apache Cassandra

Hi,

I'm trying to insert a large dataset (about 1 billion rows) to cassandra cluster with Spark.

I set the partition to 40, so each task has 25Milion rows to insert and each task took about 20hours to complete insert.

During insert, I encounter the following exception for some tasks;

java.io.IOException: Failed to write statements to *********. The

latest exception was

Not enough replicas available for query at consistency ALL (3 required but only 2 alive)

And found that failed task starts to insert from the very beginning and it took another 20Hours to complete.

Is there any way to just retry for failed rows instead of restarting the task?

unread,

Oct 19, 2020, 2:53:31 AM10/19/20

to spark-conn...@lists.datastax.com

I don't think there is. You could increase the partition number to retry smaller chunks of work.

--
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Jaroslaw Grabowski

Reply all

Reply to author

Forward

0 new messages