Spark Cassandra Cluter distributing reads on cluster

22 views

Skip to first unread message

Amol Khanolkar

unread,

Aug 8, 2020, 1:57:55 PM8/8/20

to DataStax Spark Connector for Apache Cassandra

I have set up spark cassandra cluster

- 3 cassandra nodes each is running spark worker connected to spark master in standalone mode using OSS Spark 3.0, Apache Cassandra 3.11 and SparkCassandra connector 3.0.0.beta

- I have additional 2 workers connected to spark cluster which are running on normal nodes

I am having 5000 partition primary keys which I am joining with table to fetch data from cassandra table using Direct Join available in OSS spark cassandra connector

- I wanted each direct join query to be executed on the machine where data actually is located. Currently for some reason each time its getting executed on different nodes. Is there any additional configuration /setting needed for the same ?

- When I read lot of partition data i see CPU usage does go up from 10-15% to almost 80% when i am reading.. Is there any cassandra config that I can tune for lower CPU usage ? . I tried changing some read config parameters https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md but havent had success

Regards

Amol

Amol Khanolkar

unread,

Aug 11, 2020, 2:21:25 AM8/11/20

to spark-conn...@lists.datastax.com

Any guidance on either of 2 issues ?

Regards

Amol

--
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

Reply all

Reply to author

Forward

0 new messages