Hi
I have set up spark cassandra cluster
- 3 cassandra nodes each is running spark worker connected to spark master in standalone mode using OSS Spark 3.0, Apache Cassandra 3.11 and SparkCassandra connector 3.0.0.beta
- I have additional 2 workers connected to spark cluster which are running on normal nodes
I am having 5000 partition primary keys which I am joining with table to fetch data from cassandra table using Direct Join available in OSS spark cassandra connector
- I wanted each direct join query to be executed on the machine where data actually is located. Currently for some reason each time its getting executed on different nodes. Is there any additional configuration /setting needed for the same ?
Regards
Amol