"GC overhead limit exceeded": Deceptive message?

162 views
Skip to first unread message

Nathan Smutz

unread,
Aug 8, 2017, 1:20:56 AM8/8/17
to Clojure
In the course of processing thousands of XML files (maximum size 388kb; but I am doing a lot of operations with zippers) I got this message:
OutOfMemoryError GC overhead limit exceeded  com.sun.org.apache.xerces.internal.xni.XMLString.toString

I can process about 2,100 before that pops up.  I set up a transducer sequence and I can run count over 2100 of the seq "(count (take 2100 requirement-seq))" without triggering the error; but much more and I get that Garbage Collector message.  If it's just the garbage collector, I'd think it should be able to stop between processing elements in the sequence and do it's thing.

Does this message sometimes present because the non-garbage data is getting too big?

Best,
Nathan




Peter Hull

unread,
Aug 8, 2017, 4:20:21 AM8/8/17
to Clojure

On Tuesday, 8 August 2017 06:20:56 UTC+1, Nathan Smutz wrote:
Does this message sometimes present because the non-garbage data is getting too big?
Yes, it's when most of your heap is non-garbage, so the GC has to keep running but doesn't succeed in freeing much memory each time.
See https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
 
You can increases the heap but that might only defer the problem.

As you process all your files, are you holding on to references to objects that you don't need any more?

Nathan Smutz

unread,
Aug 8, 2017, 12:19:01 PM8/8/17
to Clojure
The one thing I'm aware of holding on to is a filtered file-seq: 
(def the-files (filter #(s/ends-with? (.getName %) ".xml" ) (rest (file-seq (io/file dw-path)))))
There are 7,000+ files; but I'm assuming the elements there are just file-references and shouldn't take much space.

The rest of the process is a transducer sequence:
(def requirement-seq (sequence 
                         (comp
                           (map xml-zip-from-file)
                           (remove degree-complete?)
                           (map student-and-requirements))
                         the-files))

Those functions are admittedly space inefficient (lots of work with zippers); but are pure.  What comes out the other end is a sequence of Clojure maps.  Could holding on to the file references prevent all that processing effluvia from being collected?  

The original files add up to 1.3 gigs altogether.  I'd expect the gleaned data to be significantly smaller; but I'd better check into how close that's getting to the default heap-size.

Best,
Nathan

Gary Trakhman

unread,
Aug 8, 2017, 12:38:43 PM8/8/17
to Clojure
@Nathan the top-level (def requirement-seq ..) is probably the thing holding on to all the objects.  Try removing the def and calling (last (sequence (comp ..))) and see if it returns?  The purpose of a lazy sequence is to allow processing to happen one item or chunk at a time, if there are still problems, then maybe each element is too big, but that top-level def is definitely a no-no.  I don't think transducers are relevant here and you'd get the same problem with normal map/remove calls.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paulus Esterhazy

unread,
Aug 8, 2017, 12:39:21 PM8/8/17
to clo...@googlegroups.com
For background on "holding onto the head of a sequence" type problems, see

https://stuartsierra.com/2015/04/26/clojure-donts-concat

and

https://stackoverflow.com/questions/15994316/clojure-head-retention

Nathan Smutz

unread,
Aug 9, 2017, 3:08:14 PM8/9/17
to Clojure
Thanks @Paulus, @Gary and @Peter,

Rearranging the process to let go of the head is good advice.

I believe the problem (should I need to keep all elements in memory) may ultimately be lazy collections inside the maps I'm producing. 
I saved 1,917 of these elements to disk and it took only 3 megabytes.

An inner functions creates a lot of lazy sequences, I believe, closing over large zipper structures.  
If that's the case, then I need to wrap those sequences in (doall) expressions or refactor to something more explicitly eager.
Reply all
Reply to author
Forward
0 new messages