We tried below setting
ReadConf(splitSizeInMB=10)
Or
Conf=new SparkConf()
Conf.set("spark.cassandra.input.split.size_in_mb", "10")
It seems non of them is working .
After call repartitionCassandraReplica we still see some partitions still hold large amount of data
Thanks
Jim
Repartitions the data (via a shuffle) based upon the replication of the given keyspaceName and tableName. Calling this method before using joinWithCassandraTable will ensure that requests will be coordinator local. partitionsPerHost Controls the number of Spark Partitions that will be created in this repartitioning event. The calling RDD must have rows that can be converted into the partition key of the given Cassandra Table.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
But , the split by size property introduced from connector seems not working even for parent rdd
We did see the split.count is working .
Any other clue?
Jim
Our data size is about 240G
If we don't set anything , the default value is 64M
But we saw only default number of spark partitions in Dataframe
If we set a value like 10M, it was same.
If we set split count , it works.
Thanks
Jim
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
We have 10 nodes in dev . Each node runs two executors. We got 20 partitions without any setting and I assume this is the default value.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.