Spark Cassandra Cluter distributing reads on cluster

22 views
Skip to first unread message

Amol Khanolkar

unread,
Aug 8, 2020, 1:57:55 PM8/8/20
to DataStax Spark Connector for Apache Cassandra
Hi 

I have set up spark cassandra cluster 
- 3 cassandra nodes each is running spark worker connected to spark master in standalone mode using  OSS Spark 3.0, Apache Cassandra 3.11 and SparkCassandra connector 3.0.0.beta
- I have additional 2 workers connected to spark cluster which are running on normal nodes

I am having 5000 partition primary keys which I am joining with table to fetch data from cassandra table using Direct Join available in OSS spark cassandra connector

- I wanted each direct join query to be executed on the machine where data actually is located. Currently for some reason each time its getting executed on different nodes. Is there any additional configuration /setting needed for the same ?

- When I read lot of partition data i see CPU usage does go up from 10-15% to almost 80% when i am reading.. Is there any cassandra config that I can tune for lower CPU usage ? . I tried changing some read config parameters https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md but havent had success 

Regards
Amol

Amol Khanolkar

unread,
Aug 11, 2020, 2:21:25 AM8/11/20
to spark-conn...@lists.datastax.com
Hi

Any guidance on either of 2 issues ?

Regards
Amol

--
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
Reply all
Reply to author
Forward
0 new messages