Broadcast variables not being cleaned up? / long-running SC memory issues

MLnick

unread,

Nov 25, 2013, 10:29:05 AM11/25/13

to spark...@googlegroups.com

Hi

I have an issue with clean up of stored objects that seems to be related to broadcast variables.

I have a long-running SparkContext in a job server. This server runs jobs periodically.

At the moment my set up is that I have one SparkContext that is universal to all these jobs. They execute one after the other (in future I'll hopefully get the fair scheduler to work). I quickly ran into memory issues, so I put a call at the end of the job to drop all RDDs from memory. This did help but eventually I wind up with OOM or GC overhead or just a GC standstill.

The RDDs and intermediate data is not even that big. But to show what happens I ran the jobs every 5 minutes for a while with fairly low memory (256MB). I end up with my executor and BlockManager apparently having plenty of memory:

Executors (1)

Memory: 16.3 MB Used (158.2 MB Total)
Disk: 0.0 B Used

Executor ID	Address	RDD blocks	Memory used	Disk used	Active tasks	Failed tasks	Complete tasks	Total tasks
<driver>	192.168.0.10:62288	4	16.3 MB / 158.2 MB	0.0 B	0	0	0	0

But the MemoryStore has nothing:

16:44:44.979 [spark-server-akka.actor.default-dispatcher-3] INFO org.apache.spark.storage.MemoryStore - Block broadcast_604 stored as values to memory (estimated size 444.6 KB, free 319.4 KB)

I then wind up with something like:

16:48:01.560 [pool-1-thread-1] INFO o.apache.spark.storage.BlockManager - Writing block broadcast_247 to disk

16:48:01.561 [pool-1-thread-1] INFO org.apache.spark.storage.MemoryStore - Block broadcast_247 of size 20160 dropped from memory (free 13534502)

16:48:01.561 [pool-1-thread-1] INFO o.apache.spark.storage.BlockManager - Dropping block broadcast_246 from memory

16:48:01.561 [pool-1-thread-1] INFO o.apache.spark.storage.BlockManager - Writing block broadcast_246 to disk

16:48:01.574 [pool-1-thread-1] INFO org.apache.spark.storage.MemoryStore - Block broadcast_246 of size 788550 dropped from memory (free 14323052)

16:48:01.574 [pool-1-thread-1] INFO org.apache.spark.storage.MemoryStore - Block rdd_796_0 stored as values to memory (estimated size 13.4 MB, free 316.9 KB)

16:48:01.574 [spark-akka.actor.default-dispatcher-207] INFO o.a.s.s.BlockManagerMasterActor$BlockManagerInfo - Added rdd_796_0 in memory on 192.168.0.10:62288 (size: 13.4 MB, free: 128.5 MB)

Before I wind up with OOM:

16:48:03.404 [spark-akka.actor.default-dispatcher-207] INFO o.a.s.s.local.LocalTaskSetManager - Loss was due to java.lang.OutOfMemoryError

java.lang.OutOfMemoryError: Java heap space

I also tried an alternate version where I created a SparkContext for each job run. I end up with multiple SCs (one for each job). Here the memory stuff seems ok:

16:46:58.061 [pool-1-thread-25] INFO org.apache.spark.storage.MemoryStore - Block rdd_2_0 stored as values to memory (estimated size 13.4 MB, free 149.7 MB)

16:46:58.062 [spark-akka.actor.default-dispatcher-48] INFO o.a.s.s.BlockManagerMasterActor$BlockManagerInfo - Added rdd_2_0 in memory on 192.168.0.10:56180 (size: 13.4 MB, free: 149.8 MB)

However I eventually end up with OOM / GC Overhead errors. I think this must be a memory leak I've introduced in the way I'm running the jobs from within my job server, since the individual SparkContexs get stopped after each job has run, and start again when the next job run is called.

I seem to recall a while back some pull request about dropping broadcast variables from memory (similar to RDD.unpersist()) but I can't seem to find it.

Any help appreciated. In the meantime I am experimenting with "spark.clearner.ttl" to see if that alleviates things with the broadcast variables / MemoryStore.

Nick

Reynold Xin

unread,

Nov 25, 2013, 6:33:03 PM11/25/13

to spark...@googlegroups.com

The pull request was here: https://github.com/mesos/spark/pull/771

--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Nick Pentreath

unread,

Dec 1, 2013, 2:39:48 AM12/1/13

to spark...@googlegroups.com

Thanks Reynold

Seems to have been sorted using the "spark.cleaner.ttl" after all.

Could have a sworn I had already tried that but perhaps just too many late nights at the keyboard!

—
Sent from Mailbox for iPhone

You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/DfGc5fDiB0Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-users...@googlegroups.com.

Reply all

Reply to author

Forward