Executor ID | Address | RDD blocks | Memory used | Disk used | Active tasks | Failed tasks | Complete tasks | Total tasks |
---|---|---|---|---|---|---|---|---|
<driver> | 192.168.0.10:62288 | 4 | 16.3 MB / 158.2 MB | 0.0 B | 0 | 0 | 0 | 0 |
But the MemoryStore has nothing:
16:44:44.979 [spark-server-akka.actor.default-dispatcher-3] INFO org.apache.spark.storage.MemoryStore - Block broadcast_604 stored as values to memory (estimated size 444.6 KB, free 319.4 KB)
I then wind up with something like:
16:48:01.560 [pool-1-thread-1] INFO o.apache.spark.storage.BlockManager - Writing block broadcast_247 to disk
16:48:01.561 [pool-1-thread-1] INFO org.apache.spark.storage.MemoryStore - Block broadcast_247 of size 20160 dropped from memory (free 13534502)
16:48:01.561 [pool-1-thread-1] INFO o.apache.spark.storage.BlockManager - Dropping block broadcast_246 from memory
16:48:01.561 [pool-1-thread-1] INFO o.apache.spark.storage.BlockManager - Writing block broadcast_246 to disk
16:48:01.574 [pool-1-thread-1] INFO org.apache.spark.storage.MemoryStore - Block broadcast_246 of size 788550 dropped from memory (free 14323052)
16:48:01.574 [pool-1-thread-1] INFO org.apache.spark.storage.MemoryStore - Block rdd_796_0 stored as values to memory (estimated size 13.4 MB, free 316.9 KB)
16:48:01.574 [spark-akka.actor.default-dispatcher-207] INFO o.a.s.s.BlockManagerMasterActor$BlockManagerInfo - Added rdd_796_0 in memory on 192.168.0.10:62288 (size: 13.4 MB, free: 128.5 MB)
Before I wind up with OOM:
16:48:03.404 [spark-akka.actor.default-dispatcher-207] INFO o.a.s.s.local.LocalTaskSetManager - Loss was due to java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
I also tried an alternate version where I created a SparkContext for each job run. I end up with multiple SCs (one for each job). Here the memory stuff seems ok:
16:46:58.061 [pool-1-thread-25] INFO org.apache.spark.storage.MemoryStore - Block rdd_2_0 stored as values to memory (estimated size 13.4 MB, free 149.7 MB)
16:46:58.062 [spark-akka.actor.default-dispatcher-48] INFO o.a.s.s.BlockManagerMasterActor$BlockManagerInfo - Added rdd_2_0 in memory on 192.168.0.10:56180 (size: 13.4 MB, free: 149.8 MB)
However I eventually end up with OOM / GC Overhead errors. I think this must be a memory leak I've introduced in the way I'm running the jobs from within my job server, since the individual SparkContexs get stopped after each job has run, and start again when the next job run is called.
I seem to recall a while back some pull request about dropping broadcast variables from memory (similar to RDD.unpersist()) but I can't seem to find it.
Any help appreciated. In the meantime I am experimenting with "spark.clearner.ttl" to see if that alleviates things with the broadcast variables / MemoryStore.
Nick
--
You received this message because you are subscribed to the Google Groups "Spark Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
You received this message because you are subscribed to a topic in the Google Groups "Spark Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spark-users/DfGc5fDiB0Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spark-users...@googlegroups.com.