OR it is the value says number of cassandra partitions to be map to a single spark partition?
if so, then what is the meaning of MB.
as per my knowledge a single cassandra partition can not be divide into number of spark partition.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
what is the max value we can set for spilt.size? based on executor memory?i understand that cassandra partition can never ever split into multiple spark partitions. hence, it's very challenging to handle hotspot C* partitions.if cassandra partitions >64MB then
On Tue, May 29, 2018 at 10:40 PM, Russell Spitzer <rus...@datastax.com> wrote:
Yes MB is in MB :)
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md#how-does-the-connector-evaluate-number-of-spark-partitions
Since the connector does not know the distribution of C* partitions before fully reading them the size estimate for a vnode range (as reported by C*) is assumed to be evenly distributed. If a single token within a range dominates the size usage of that range then there will be imbalances.
On Tue, May 29, 2018 at 11:19 AM sparkcassuser <purni...@gmail.com> wrote:
i would like to clear the following,
what is default value of input.split.size_in_mb? is it in MB ?
OR it is the value says number of cassandra partitions to be map to a single spark partition?
if so, then what is the meaning of MB.
as per my knowledge a single cassandra partition can not be divide into number of spark partition.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
1 token = 1 cassandra partition
token range mapped to a single spark partition is defined by split.size
estimation of token range MB is done by connector implicitly.
please correct me i am wrong
after mapping of cassandra partitions to spark partitions, can we re partition spark partitions such that each partition own equal number of records.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
what should be the value of split.size for the following?
here max partition size is @52mb and min size = 180 Bytes
if I take split.size_in_mb = 1
it also able to read data into spark with correct number of rows.
how it is possible?
i think split.size >= max partition size in cassandra
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 0.00 0.00 446 24
75% 0.00 0.00 0.00 1916 103
95% 0.00 0.00 0.00 219342 14237
98% 0.00 0.00 0.00 3379391 182785
99% 0.00 0.00 0.00 8409007 454826
Min 0.00 0.00 0.00 180 11
Max 0.00 0.00 0.00 52066354 2816159
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to a topic in the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lists.datastax.com/d/topic/spark-connector-user/Zfm1StbG__8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-connector-user+unsub...@lists.datastax.com.
If the split.size is 1mb then how it can accommodate Cassandra partition of 52mb?
On Tuesday, June 5, 2018, Russell Spitzer <rus...@datastax.com> wrote:
Any value you want is fine. Just depends on how big you want the token ranges to be.
On Tue, Jun 5, 2018 at 2:29 AM <purnim...@iet.ahduni.edu.in> wrote:
Hi Russell
what should be the value of split.size for the following?
here max partition size is @52mb and min size = 180 Bytes
if I take split.size_in_mb = 1
it also able to read data into spark with correct number of rows.
how it is possible?
i think split.size >= max partition size in cassandra
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 0.00 0.00 446 24
75% 0.00 0.00 0.00 1916 103
95% 0.00 0.00 0.00 219342 14237
98% 0.00 0.00 0.00 3379391 182785
99% 0.00 0.00 0.00 8409007 454826
Min 0.00 0.00 0.00 180 11
Max 0.00 0.00 0.00 52066354 2816159
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
You received this message because you are subscribed to a topic in the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lists.datastax.com/d/topic/spark-connector-user/Zfm1StbG__8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.