Access to Cassandra from Spark streaming job extremely slow when Cassandra nodes are down

Antonio Ye

unread,

Jul 23, 2017, 6:07:17 PM7/23/17

to DataStax Spark Connector for Apache Cassandra

While testing Spark and Cassandra HA, we killed 3 out of 9 Cassandra nodes. Our Spark job continues to run and no errors are reported but it is running extremely slow when we do this. Each batch used to take about 16 seconds to process but after killing the Cassandra nodes it takes several minutes. When we remove the 3 nodes from the Cassandra cluster by running nodetool removenode the performance goes back to what it used to be before killing the 3 nodes. Any ideas why this is happening?
We are using NetworkTopologyStrategy with a replication factor of 3 for our keyspace.Another thing to note is that we are running on Kubernetes cluster in AWS and setting endpoint_snitch to Ec2Snitch.

Jim Hatcher

unread,

Jul 23, 2017, 7:29:52 PM7/23/17

to spark-conn...@lists.datastax.com

What is your rack setup?

From: Antonio Ye

Sent: Sunday, July 23, 5:07 PM

Subject: Access to Cassandra from Spark streaming job extremely slow when Cassandra nodes are down

To: DataStax Spark Connector for Apache Cassandra

While testing Spark and Cassandra HA, we killed 3 out of 9 Cassandra nodes. Our Spark job continues to run and no errors are reported but it is running extremely slow when we do this. Each batch used to take about 16 seconds to process but after killing the Cassandra nodes it takes several minutes. When we remove the 3 nodes from the Cassandra cluster by running nodetool removenode the performance goes back to what it used to be before killing the 3 nodes. Any ideas why this is happening? We are using NetworkTopologyStrategy with a replication factor of 3 for our keyspace.Another thing to note is that we are running on Kubernetes cluster in AWS and setting endpoint_snitch to Ec2Snitch. -- You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group. To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Antonio Ye

unread,

Jul 23, 2017, 8:00:16 PM7/23/17

to spark-conn...@lists.datastax.com

We are configuring all Cassandra nodes on the same DC and rack.

Russell Spitzer

unread,

Jul 23, 2017, 9:11:42 PM7/23/17

to spark-conn...@lists.datastax.com

Could be hint storage depending on version... Would probably need to hear more details, what is the job doing, how many rows what is the RF? In the UI do tasks take the same amount of time? Is there a lot of GC

Russell Spitzer
Software Engineer

Antonio Ye

unread,

Jul 24, 2017, 11:01:33 AM7/24/17

to DataStax Spark Connector for Apache Cassandra

We are on Spark 2.1.0 and spark-cassandra-connector 2.0.0. As for what our job is doing, it does an RDD leftJoinWithCassandraTable, deletes some of the duplicate records from Cassandra, and then inserts any new records. We have RF set to three and did notice the tasks taking significant amount of time. I will try to duplicate the problem again and will take a closer look at the Spark UI to see if we can identify the bottleneck.

Thanks,
Tony

> Russell Spitzer
> Software Engineer

Antonio Ye

unread,

Jul 25, 2017, 1:38:54 AM7/25/17

to DataStax Spark Connector for Apache Cassandra

Any idea on what I should look at next? I am only processing a few hundred rows and every task in the Spark job that accesses Cassandra seems to be taking significantly longer to process when some of the nodes are down and as soon as I remove the nodes from the Cassandra cluster everything comes back to the normal processing time.

Reply all

Reply to author

Forward