I'm trying to load 1 billion records to cassandra. Each record has different partition key. So, I think there is no need to do batching . Here are the parameters I'm trying to tune
"spark.cassandra.output.batch.size.rows": 1,
"spark.cassandra.output.concurrent.writes":500,
"spark.cassandra.output.throughput_mb_per_sec": 1
Are there any other parameters I need to tune for this kind of scenario to not overwhelm the cassandra
Thanks
Giri,
Have you seen this presentation? It's a gold mine.
Jim
From: Giri
Sent: Tuesday, April 18, 3:30 PM
Subject: spark cassandra write performance
To: DataStax Spark Connector for Apache Cassandra
Hi, I'm trying to load 1 billion records to cassandra. Each record has different partition key. So, I think there is no need to do batching . Here are the parameters I'm trying to tune "spark.cassandra.output.batch.size.rows": 1, "spark.cassandra.output.concurrent.writes":500, "spark.cassandra.output.throughput_mb_per_sec": 1 Are there any other parameters I need to tune for this kind of scenario to not overwhelm the cassandra Thanks -- You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group. To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.