I am doing big data processing (5GB storage-30M rows, 16 columns)with Cassandra, Spark, SparkSQL stack. I setup 4 node cluster (8cores, 16GB RAM) and Spark and Cassandra deployed on the same node. When I am trying to perform ad-hoc query processing on dataset using SparkSQL (via sparklyr) , mostly I get an error "java.lang.OutOfMemoryError: Java heap space/ java.lang.OutOfMemoryError: GC overhead limit exceeded"
Spark configuration:
spark.executor.memory <- "2G"
spark.driver.memory="4G"
Cassandra configuration:
MAX_HEAP_SIZE=4GB in cassandra.yaml file
My question is,
when we use multiple big data tools to perform big data processing, how and where (Cassandra/Spark/SparkSQL) to configure head size?
down the line, we are going to develop web based REST API to perform ad-hoc processing, will heap create problem there?
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
The information contained in this transmission may contain privileged and confidential information of Big Tree Entertainment Pvt Ltd, including information protected by privacy laws. It is intended only for the use of Big Tree Entertainment Pvt Ltd. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution, or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Although Big Tree Entertainment Pvt Ltd. has taken reasonable precautions to ensure no viruses are present in this email, Big Tree Entertainment Pvt Ltd. cannot accept responsibility for any loss or damage arising from the use of this email or attachments. Computer viruses can be transmitted via email. Recipient should check the email and any attachments for the presence of viruses before using them. Any views or opinions are solely those of the author and do not necessarily represent those of Big Tree Entertainment Pvt Ltd.
can pagination at spark sql can help me to overcome this problem?
during ad-hoc query processing i identified that when i tried to collect large number of rows after filtration/selection, it gives me such errors. I have increased driver memory as well as overhead memory, but issue still persist.
can pagination at spark sql can help me to overcome this problem?
Is toLocalIterator can be applied to spark data frame?
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
On Sun, May 27, 2018, 4:55 AM <purnim...@iet.ahduni.edu.in> wrote:
Is toLocalIterator can be applied to spark data frame?
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.
--
You received this message because you are subscribed to a topic in the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lists.datastax.com/d/topic/spark-connector-user/NG2Xz94v0QU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-connector-user+unsub...@lists.datastax.com.
i tried lot but not able to find same functionality in sparklyr. i tried sparklyr mailing list also. is it available?On Sun, May 27, 2018 at 8:06 PM, Russell Spitzer <rus...@datastax.com> wrote:
On Sun, May 27, 2018, 4:55 AM <purnim...@iet.ahduni.edu.in> wrote:
Is toLocalIterator can be applied to spark data frame?
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to a topic in the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lists.datastax.com/d/topic/spark-connector-user/NG2Xz94v0QU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-connector-...@lists.datastax.com.
--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.