SPARK_MEM question

561 views
Skip to first unread message

seanm

unread,
Oct 27, 2012, 3:57:18 AM10/27/12
to spark...@googlegroups.com
I am testing out spark and keep running into this when running the master:
Could not reserve enough space for object heap

My cluster is as follows:
1 master: 4gb mem 2 cores
6 slaves: 16gb mem 8 cores each

if I set SPARK_MEM to be near 16gb for the slaves, the master also takes this param and I get that error when starting the master.  If I set SPARK_MEM to be 2gb on only the master, it starts up ok... but then my slaves only use 2gb mem!

So my question is- Is the spark master meant to be sized the same as the saves?  Am I handicapping the cluster by running a smaller sized master?


Thanks!

Reynold Xin

unread,
Oct 27, 2012, 4:00:01 PM10/27/12
to spark...@googlegroups.com
If you set SPARK_MEM on master's spark-env.sh, and then SPARK_MEM again on the slave's spark-env.sh, it should be ok to use two different values.

seanm

unread,
Oct 27, 2012, 6:30:30 PM10/27/12
to spark...@googlegroups.com
Ahh, thanks.  I think part of my issue was that I was getting into the spark-shell from the master (so it would use the master configs when launching the workers).   That makes sense now.  I will connect to it remotely, that should fix my issue.

Also- Is it ok the give the master fewer cores/mem or should I be giving it more?  I'm not sure of what workloads the master handles, or what is normal to give it resource wise?

Reynold Xin

unread,
Oct 27, 2012, 7:53:56 PM10/27/12
to spark...@googlegroups.com
In most cases, the master is only responsible for the control plane (e.g. scheduling), and doesn't involve data flow. However, if you trigger an action, some data gets routed to the master (e.g. count, reduce). Utilization of the master really depends on your workload. If you have a small number of nodes (less than a hundred), the master probably has no visible load.


--
Reynold Xin

seanm

unread,
Oct 28, 2012, 4:20:09 AM10/28/12
to spark...@googlegroups.com
Ok, that makes sense.  Still having an issue though. The vm I'm launching spark-shell from has 2g mem.

If I try launching with say:

MASTER=spark://....:7077 SPARK_MEM=14g ./spark-shell

I see from the UI -> cluster summary that my workers are in fact using 14g.  However it also seems to launch the spark-shell with this param too(!).  My shell becomes unusable because it only has 2g mem (see the error output at the bottom).  Is this the correct behavior for spark-shell currently?  If it helps my spark-env.sh has:

if [ -z "$SPARK_MEM" ]
then
export SPARK_MEM="<MEM>"
fi

where <MEM> is 2g on the master and 14g on the slaves.  If I call spark-shell without specifying a SPARK_MEM my workers only use 512mb.


Output:

scala> [WARN] Failed to query stty columnsjava.io.IOException: Cannot run program "sh": java.io.IOException: error=12, Cannot allocate memory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
        at java.lang.Runtime.exec(Runtime.java:610)
        at java.lang.Runtime.exec(Runtime.java:483)
        at scala.tools.jline.internal.TerminalLineSettings.exec(TerminalLineSettings.java:178)
        at scala.tools.jline.internal.TerminalLineSettings.exec(TerminalLineSettings.java:168)
        at scala.tools.jline.internal.TerminalLineSettings.stty(TerminalLineSettings.java:163)
        at scala.tools.jline.internal.TerminalLineSettings.get(TerminalLineSettings.java:67)
        at scala.tools.jline.internal.TerminalLineSettings.getProperty(TerminalLineSettings.java:87)
        at scala.tools.jline.UnixTerminal.getWidth(UnixTerminal.java:94)
        at scala.tools.jline.console.ConsoleReader.drawBuffer(ConsoleReader.java:582)
        at scala.tools.jline.console.ConsoleReader.drawBuffer(ConsoleReader.java:601)
        at scala.tools.jline.console.ConsoleReader.putChar(ConsoleReader.java:540)
        at scala.tools.jline.console.ConsoleReader.readLine(ConsoleReader.java:1430)
        at scala.tools.jline.console.ConsoleReader.readLine(ConsoleReader.java:1161)
        at spark.repl.SparkJLineReader.readOneLine(SparkJLineReader.scala:72)
        at scala.tools.nsc.interpreter.InteractiveReader$class.readLine(InteractiveReader.scala:44)
        at spark.repl.SparkJLineReader.readLine(SparkJLineReader.scala:19)
        at spark.repl.SparkILoop.readOneLine$1(SparkILoop.scala:564)
        at spark.repl.SparkILoop.loop(SparkILoop.scala:576)
        at spark.repl.SparkILoop.process(SparkILoop.scala:879)
        at spark.repl.SparkILoop.process(SparkILoop.scala:894)
        at spark.repl.Main$.main(Main.scala:14)
        at spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
        at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
        at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
        at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
        at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
        at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
        at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
        at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
        at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
        at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:164)
        at java.lang.ProcessImpl.start(ProcessImpl.java:81)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
        ... 37 more





Matei Zaharia

unread,
Oct 28, 2012, 4:22:39 AM10/28/12
to spark...@googlegroups.com
The reason is that the master passes its SPARK_MEM value to workers too. The easiest way to fix it is to remove the if [-z SPARK_MEM] part and just set it to 2 GB on the master and 14 on the workers. Unfortunately this will not be reported correctly to Mesos (it will think you're only using 2), but it's the only way to do this right now if you run spark-shell. If you run your own standalone program, its memory can be set separately from SPARK_MEM.

Matei

seanm

unread,
Oct 28, 2012, 4:30:44 AM10/28/12
to spark...@googlegroups.com
Ok thanks, I will try that and do it in a standalone program.

Sean

seanm

unread,
Oct 28, 2012, 6:00:13 AM10/28/12
to spark...@googlegroups.com
I have two spark standalone clusters setup.  I can run the examples and spark-shell between the clusters just fine.  But when I run examples/spark-shell from my computer at home on the VPN: it connects, registers a job on the master, but then instantly disconnects: http://pastebin.com/qcxynYeh

I'm wondering if this is the culprit as it's binding on my home networks ip, which my spark clusters can't see?:
12/10/28 03:40:24 INFO HttpBroadcast: Broadcast server started at http://10.0.1.3:63774

Does the driver program need to be able to accept inbound connections from the spark nodes in order to function?

Sorry for all the questions, I am having a blast playing with spark and just trying to get my bearings.

Sean

Reynold Xin

unread,
Oct 29, 2012, 5:04:39 AM10/29/12
to spark...@googlegroups.com
Yes, the slave nodes connects to the driver to register themselves. The driver program does need to be able to accept inbound connections.

--
Reynold Xin



Reply all
Reply to author
Forward
0 new messages