how to fix "java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: GC overhead limit exceeded" error

purnim...@iet.ahduni.edu.in

unread,

May 23, 2018, 2:56:19 AM5/23/18

to DataStax Spark Connector for Apache Cassandra

Hello!

I am doing big data processing (5GB storage-30M rows, 16 columns)with Cassandra, Spark, SparkSQL stack. I setup 4 node cluster (8cores, 16GB RAM) and Spark and Cassandra deployed on the same node. When I am trying to perform ad-hoc query processing on dataset using SparkSQL (via sparklyr) , mostly I get an error "java.lang.OutOfMemoryError: Java heap space/ java.lang.OutOfMemoryError: GC overhead limit exceeded"

Spark configuration:
spark.executor.memory <- "2G"
spark.driver.memory="4G"
Cassandra configuration:
MAX_HEAP_SIZE=4GB in cassandra.yaml file

My question is,

when we use multiple big data tools to perform big data processing, how and where (Cassandra/Spark/SparkSQL) to configure head size?

down the line, we are going to develop web based REST API to perform ad-hoc processing, will heap create problem there?

Dipesh Maurya

unread,

May 23, 2018, 3:03:25 AM5/23/18

to spark-conn...@lists.datastax.com

Increase overhead memory in the configuration.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

The information contained in this transmission may contain privileged and confidential information of Big Tree Entertainment Pvt Ltd, including information protected by privacy laws. It is intended only for the use of Big Tree Entertainment Pvt Ltd. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution, or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Although Big Tree Entertainment Pvt Ltd. has taken reasonable precautions to ensure no viruses are present in this email, Big Tree Entertainment Pvt Ltd. cannot accept responsibility for any loss or damage arising from the use of this email or attachments. Computer viruses can be transmitted via email. Recipient should check the email and any attachments for the presence of viruses before using them. Any views or opinions are solely those of the author and do not necessarily represent those of Big Tree Entertainment Pvt Ltd.

Rahul Singh

unread,

May 23, 2018, 5:46:20 AM5/23/18

to spark-conn...@lists.datastax.com

Also look into Spark Job Server. If SparkSQL is not responsive as you want, you could pre run an analytics task to populate data in a report table and then fetch that data for your rest api. Ultimately all the operations in Spark are in memory so you are limited to how much memory you have.

--
Rahul Singh
rahul...@anant.us

Anant Corporation

sparkcassuser

unread,

May 25, 2018, 2:05:50 AM5/25/18

to DataStax Spark Connector for Apache Cassandra

during ad-hoc query processing i identified that when i tried to collect large number of rows after filtration/selection, it gives me such errors. I have increased driver memory as well as overhead memory, but issue still persist.

can pagination at spark sql can help me to overcome this problem?

Russell Spitzer

unread,

May 25, 2018, 8:05:54 AM5/25/18

to spark-conn...@lists.datastax.com

It all depends where your oom is, to local interator will pull one spark partition at a time to the driver, but this won't help executor ooms.

http://www.russellspitzer.com/2015/12/15/Spark-Writing-Driver-FIleSystem/

On Fri, May 25, 2018, 1:05 AM sparkcassuser <purni...@gmail.com> wrote:

during ad-hoc query processing i identified that when i tried to collect large number of rows after filtration/selection, it gives me such errors. I have increased driver memory as well as overhead memory, but issue still persist.

can pagination at spark sql can help me to overcome this problem?

purnim...@iet.ahduni.edu.in

unread,

May 27, 2018, 5:55:46 AM5/27/18

to DataStax Spark Connector for Apache Cassandra

Is toLocalIterator can be applied to spark data frame?

Russell Spitzer

unread,

May 27, 2018, 10:36:46 AM5/27/18

to spark-conn...@lists.datastax.com

Yes
https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.Dataset@toLocalIterator():java.util.Iterator[T]

On Sun, May 27, 2018, 4:55 AM <purnim...@iet.ahduni.edu.in> wrote:

Is toLocalIterator can be applied to spark data frame?

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

Purnima Shah

unread,

May 28, 2018, 1:00:25 PM5/28/18

to spark-conn...@lists.datastax.com

i tried lot but not able to find same functionality in sparklyr. i tried sparklyr mailing list also. is it available?

On Sun, May 27, 2018 at 8:06 PM, Russell Spitzer <rus...@datastax.com> wrote:

Yes
https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.Dataset@toLocalIterator():java.util.Iterator[T]

On Sun, May 27, 2018, 4:55 AM <purnim...@iet.ahduni.edu.in> wrote:

Is toLocalIterator can be applied to spark data frame?

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-user+unsub...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to a topic in the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lists.datastax.com/d/topic/spark-connector-user/NG2Xz94v0QU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-connector-user+unsub...@lists.datastax.com.

Russell Spitzer

unread,

May 28, 2018, 10:29:26 PM5/28/18

to spark-conn...@lists.datastax.com

It works on all rdd, dataframes, datasets, and derived classes. Without more details that's all I can say.

On Mon, May 28, 2018, 12:00 PM Purnima Shah <purnim...@iet.ahduni.edu.in> wrote:

i tried lot but not able to find same functionality in sparklyr. i tried sparklyr mailing list also. is it available?

On Sun, May 27, 2018 at 8:06 PM, Russell Spitzer <rus...@datastax.com> wrote:

Yes
https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.Dataset@toLocalIterator():java.util.Iterator[T]

On Sun, May 27, 2018, 4:55 AM <purnim...@iet.ahduni.edu.in> wrote:

Is toLocalIterator can be applied to spark data frame?

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

--

Russell Spitzer
Software Engineer

--
You received this message because you are subscribed to a topic in the Google Groups "DataStax Spark Connector for Apache Cassandra" group.
To unsubscribe from this topic, visit https://groups.google.com/a/lists.datastax.com/d/topic/spark-connector-user/NG2Xz94v0QU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to spark-connector-...@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Spark Connector for Apache Cassandra" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spark-connector-...@lists.datastax.com.

purnim...@iet.ahduni.edu.in

unread,

May 30, 2018, 3:15:23 AM5/30/18

to DataStax Spark Connector for Apache Cassandra

i think it's not available in SparkR. i have to wrap scala code into R.

Reply all

Reply to author

Forward