Is there way yo retrying only failed rows?

10 views
Skip to first unread message

Sungho Kim

unread,
Oct 19, 2020, 1:45:06 AM10/19/20
to DataStax Spark Connector for Apache Cassandra
Hi, 
I'm trying to insert a large dataset (about 1 billion rows) to cassandra cluster with Spark. 
I set the partition to 40, so each task has 25Milion rows to insert and each task took about 20hours to complete insert.

During insert, I encounter the following exception for some tasks;

java.io.IOException: Failed to write statements to *********. The
latest exception was
  Not enough replicas available for query at consistency ALL (3 required but only 2 alive)


And found that failed task starts to insert from the very beginning and it took another 20Hours to complete.

Is there any way to just retry for failed rows instead of restarting the task?


Jaroslaw Grabowski

unread,
Oct 19, 2020, 2:53:31 AM10/19/20
to spark-conn...@lists.datastax.com
I don't think there is. You could increase the partition number to retry smaller chunks of work.

--
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.


--
Jaroslaw Grabowski

Reply all
Reply to author
Forward
0 new messages