Cannot allocate memory Error

Haokun Luo

unread,

Jul 18, 2013, 6:41:27 PM7/18/13

to spark...@googlegroups.com

Hi All,

I am running Spark 0.7.2 on a five-node cluster. Each machine is 64GB RAM and 24 cores. I started the driver and worker node manually (somehow bin/start-all.sh doesn't work). I set the SPARK_MEM to be 60g and SPARK_WORKER_MEMORY to be 64g. I just run a simple application to test the connectivity between nodes, i.e.

val hostnames = sc.parallelize(1 to 24*5, 24*5).map(i =>

InetAddress.getLocalHost().getHostName()

).collect()

However I got the following memory error:

13/07/18 22:32:43 INFO SparkDeploySchedulerBackend: Granted executor ID app-20130718223243-0005/2 on host a.b.c.d with 24 cores, 128.0 MB RAM

13/07/18 22:32:43 INFO DAGScheduler: Got job 0 (collect at cluster.scala:23) with 120 output partitions (allowLocal=false)

13/07/18 22:32:43 INFO DAGScheduler: Final stage: Stage 0 (map at cluster.scala:21)

13/07/18 22:32:43 INFO DAGScheduler: Parents of final stage: List()

13/07/18 22:32:43 INFO Client$ClientActor: Executor updated: app-20130718223243-0005/0 is now RUNNING

13/07/18 22:32:43 INFO Client$ClientActor: Executor updated: app-20130718223243-0005/1 is now RUNNING

13/07/18 22:32:43 INFO Client$ClientActor: Executor updated: app-20130718223243-0005/0 is now FAILED (class java.io.IOException: Cannot run program "/.../lib/spark-0.7.2/run" (in directory "/.../lib/spark-0.7.2/work/app-20130718223243-0005/0"): java.io.IOException: error=12, Cannot allocate memory)

13/07/18 22:32:43 INFO SparkDeploySchedulerBackend: Executor app-20130718223243-0005/0 removed: class java.io.IOException: Cannot run program "/.../lib/spark-0.7.2/run" (in directory "/export/cnc_cup/lib/spark-0.7.2/work/app-20130718223243-0005/0"): java.io.IOException: error=12, Cannot allocate memory

Does anyone have any suggestions?

Best,

Haokun

Reynold Xin

unread,

Jul 18, 2013, 6:43:36 PM7/18/13

to spark...@googlegroups.com

It means your system actually ran out of memory or swap space for the jvm to launch a new process.

In general you have other processes running on your system, so you shouldn't allocate 100% of the memory for Spark. Try reduce that amount of memory a little bit.

--

Reynold Xin, AMPLab, UC Berkeley

http://rxin.org

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Haokun Luo

unread,

Jul 18, 2013, 7:00:18 PM7/18/13

to spark...@googlegroups.com

Hi Reynold,

Thank you for the reply.

I actually have another question about SPARK_MEM and SPARK_WORKER_MEMORY. Let's say we have a driver node which is also a worker node, and we physically have 64GB RAM. What is the best setting for those two variables so that I could memorize my memory usage for Spark? And also what's their difference in this case? Thank you.

Best,

Haokun

Reynold Xin

unread,

Jul 18, 2013, 8:21:20 PM7/18/13

to spark...@googlegroups.com

On each node, SPARK_WORKER_MEMORY determines the memory allocated to a Spark standalone cluster. In a standalone cluster, you can have multiple Spark instances connecting to it, each having SPAKR_MEM amount of memory.

By default the driver program also uses SPARK_MEM (using the run script), but depending on how you run your program, you can change that.

Haokun Luo

unread,

Jul 18, 2013, 9:17:26 PM7/18/13

to spark...@googlegroups.com

Hi Reynold,

Thank you for the clarification. The cluster works great now.

I have another question for the configuration. Let's say one application could consume tons of memory, and I set the SPARK_MEM to be 20 GB and SPARK_WORKER_MEMORY to be 40GB.

What will happen if the application gradually consume the memory size greater than 20GB (SPARK_MEM)?
What will happen if the application gradually consume the memory size greater than 40GB (SPARK_WORKER_MEMORY)?

Best,
Haokun

Reynold Xin

unread,

Jul 19, 2013, 1:45:12 AM7/19/13

to spark...@googlegroups.com

SPARK_MEM is a JVM memory setting. It caps the JVM heap size (basically -xmx). JVM applications can never grow their heap larger than the xmx setting.

--

Reynold Xin, AMPLab, UC Berkeley

http://rxin.org

Haokun Luo

unread,

Jul 19, 2013, 1:24:52 PM7/19/13

to spark...@googlegroups.com

Hi Reynold,

Thank you for the reply.

After playing around with two parameters for a while, I found an interesting relationship between the two. On the worker node, I could allocate as much of SPAKR_MEM with cap at SPARK_WORKER_MEMORY. However, I could only allocate up to half of SPARK_WORKER_MEMORY for SPARK_MEM on the driver node. Is this because driver node preallocate half of the memory space for setting the driver node? Just curious about the reason, thanks.

Best,

Haokun

Reynold Xin

unread,

Jul 19, 2013, 2:00:17 PM7/19/13

to spark...@googlegroups.com

How did you launch your driver node?

If you are using run, it probably uses SPARK_MEM memory, that's why you could only use half of SPARK_WORKER_MEMORY for the executor on the driver.

You can launch the driver program manually and specify the memory it uses using Java's -xmx parameter.

Haokun Luo

unread,

Jul 19, 2013, 3:00:06 PM7/19/13

to spark...@googlegroups.com

Hi Reynold,

Currently I am manually start all the workers and driver using "run" program. I have disabled all the SPARK_MEM assignment in the run program, but I set it in conf/spark_env.sh. One interesting thing is that I could launch the worker node with whatever I want. However, my test application could only run when I set SPARK_MEM to be 32GB, I am wondering whether there is some hidden parameters to cap that besides in the conf folder?

Best,

Haokun

You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/U1uYqEri5hU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-users...@googlegroups.com.

Haokun Luo

unread,

Jul 19, 2013, 4:49:23 PM7/19/13

to spark...@googlegroups.com

I figured it out. It was in the setting of my test application. That conf folder overwrite the parameter settings.

Best,
Haokun

To unsubscribe from this group and stop receiving emails from it, send an email to spark-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/U1uYqEri5hU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to spark-users+unsubscribe@googlegroups.com.

Haokun Luo

unread,

Jul 19, 2013, 6:13:48 PM7/19/13

to spark...@googlegroups.com

Hi Reynold,

It seems that there is still confusion between the SPARK_MEM configuration for the worker process (let's say SPARK_MEM_WORKER) and that in the setting of the launching application on the master node (let's say SPARK_MEM_APP). Is SPARK_MEM_WORKER_JVM the capacity of SPARK_MEM_APP? Thanks.

Best,

Haokun

Reynold Xin

unread,

Jul 19, 2013, 8:32:29 PM7/19/13

to spark...@googlegroups.com

What is SPARK_MEM_WORKER_JVM?

--

Reynold Xin, AMPLab, UC Berkeley

http://rxin.org

Haokun Luo

unread,

Jul 19, 2013, 9:17:31 PM7/19/13

to spark...@googlegroups.com

Oh, I mean SPARK_MEM_WORKER, sorry about the typo.

I under the basic meaning of SPARK_MEM and SPARK_WORKER_MEMORY, but I some trouble to see how different parameters fitting in the context. Let's say we have three different stages: install/package Spark, launch driver/worker process on cluster, and execute the application. Basically I want to know which setting could actually affect the memory allocation for the application. So my question could break down in three:

In the first stage (install/package Spark), I add the "-Xmx=32g" and "-Xms=32g" parameters in the Spark-0.7.2/sbt/sbt program. Then I create the package with "sbt/sbt package". Does this setting affect any setting in the SPARK_MEM in the other two stages?
In the second stage (launch driver/worker process), I make sure that only "conf/spark-env.sh" control the SPARK_MEM and SPARK_WORKER_MEMORY. Let's say we launch a driver process and a worker process on the same machine with manually start, i.e. ./run spark.deploy... In other words, both process takes the same setting for SPARK_MEM and SPARK_WORKER_MEMORY as the same "run" program provided in the Spark package. My question is that does any of the SPARK_MEM for each process affect the memory limit for the application in the future?
In the third stage (execute the application), I will also enforce the setting of SPARK_MEM and SPARK_WORKER_MEMORY in the local folder "conf/spark-env.sh", and I also borrowed the "run" program by changing the internal variables accordingly. So will those two settings affect the actually heap size that application will consume?

I know it might be a little tedious, but I really want to know exactly how those settings affect the actual heap size. I saw that the memory size on web UI matches the SPARK_WORKER_MEMORY in stage 2, but I still got the "cannot allocate memory" error for my large dataset input application. I really want to know which setting is wrong that blocks it from running through. Really appreciate your help.

Best,

Haokun

Haokun

You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/U1uYqEri5hU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to spark-users...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward