Persistent collections and garbage collection

807 views
Skip to first unread message

pron

unread,
Feb 7, 2012, 11:16:49 AM2/7/12
to clo...@googlegroups.com
Hi. I have a question:
I love Clojure's persistent collections, but don't they generate many long-lived objects (that go to the surving generations) that are hard to garbage-collect? After all, the discarded nodes after "modification" are not necessarily short lived. It seems like they would behave badly from the GC perspective. Am I wrong?

Paudi Moriarty

unread,
Feb 8, 2012, 11:42:06 AM2/8/12
to clo...@googlegroups.com
Hi Ron,

I think the persistent collections are no different from any other collections from a GC perspective in that you control which references you keep and for how long in your code. After "modification" some nodes may no longer be referenced and will be eligible for GC so I'm not sure what you mean by "not necessarily short lived". Surely what is short lived is completely dependent on the use case?

Also, since they are sharing structure they must be better than copy-on-write collections for object creation. That is the whole point. Obviously, compared to mutable collections there is some overhead due to additional node creation but that's a trade off and not relevant to your point about long lived objects.

Paudi

On 7 February 2012 16:16, pron <ron.pr...@gmail.com> wrote:
Hi. I have a question:
I love Clojure's persistent collections, but don't they generate many long-lived objects (that go to the surving generations) that are hard to garbage-collect? After all, the discarded nodes after "modification" are not necessarily short lived. It seems like they would behave badly from the GC perspective. Am I wrong?

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

dgrnbrg

unread,
Feb 8, 2012, 11:19:08 PM2/8/12
to Clojure
It seems to me that a generational collector would have problems
collecting Clojure's garbage production pattern. Luckily, the oracle/
hotspot jvm has a continuous collecting compacting GC called G1. That
should mitigate oldspace collection latency spikes. Enable with -XX:
+UnlockExperimentalVMOptions -XX:+UseG1GC

pron

unread,
Feb 9, 2012, 7:30:36 PM2/9/12
to clo...@googlegroups.com
Yes, that's what I thought. Does anyone have any experience with Clojure's garbage production pattern (esp. due to the persistent collection) and it's behavior with the older GCs as well as with G1? 

Sean Corfield

unread,
Feb 9, 2012, 9:35:35 PM2/9/12
to clo...@googlegroups.com

These are the options we run with in production - Clojure is loaded at
runtime into a Tomcat application:

-Dsun.rmi.dgc.client.gcInterval=600000
-Dsun.rmi.dgc.server.gcInterval=600000
-XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC

That's the usual RMI GC fix, and selection of CMS GC which seemed to
suit our memory usage better.

Part of the middleware we use explicitly requests GC approximately
every minute. Disabling those explicit GC calls led to long
stop-the-world GC cycles often enough to have a noticeable effect on
the end user experience so we set explicit calls to use CMS. This had
two benefits for us: consistent performance (the CMS GC bails if takes
too long so our request processing times stay fairly consistent) and
keeping memory usage to reasonable levels (which means we only get
occasional stop-the-world GC cycles and only under extreme bursts of
load). We have min/max heap set to 2.5GB. We may revisit this as we
ramp up traffic but for now this is working pretty well.

The amount of time per request spent in Clojure code varies between 5%
and 80% (depending on the type of request - some logic hardly hits the
Clojure code, some logic is implemented almost entirely in Clojure -
we expect the balance to continue to shift toward Clojure over time).
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

pron

unread,
Feb 10, 2012, 7:01:26 AM2/10/12
to clo...@googlegroups.com
And have you profiled the objects making it to the old generation (requiring/causing the full GC)? Are persistent collection nodes a significant part of those?

Armando Blancas

unread,
Feb 10, 2012, 10:56:28 AM2/10/12
to Clojure
Not sure if I understand your concern since I wonder how's that
different from, say, some ephemeral Java collection that happens to be
around after a given generational threshold and then becomes
unreachable.

Patrick Moriarty

unread,
Feb 10, 2012, 5:13:20 PM2/10/12
to clo...@googlegroups.com
Hi Ron,

I've profiled a 20k line Clojure application and have never seen anything like what you seem to be suggesting. It seems (though you've not been very clear) that you suspect that persistent collections may be holding references to nodes longer than necessary. That is, longer than a mutable collection would. My reading of the implementation suggests that this is not the case. Do you have any specific evidence or is this just a hunch? Any 'modifications' to a persistent collection return a new collection which may or may not share structure with the previous one. If you don't hold a reference to the previous version the collection and the unshared nodes become eligible for garbage collection. Am I missing something here?

Paudi

pron

unread,
Feb 12, 2012, 8:44:58 AM2/12/12
to clo...@googlegroups.com
Alright, let me explain. I'm not talking about memory leaks or whether or not objects become eligible for collection when they should. The JVM garbage collector is a generational collector, which means there's a humongous difference in GC work between objects that become unreachable (eligible for GC) shortly after creation, and those that become unreachable after a while (specifically, after surviving a couple of "minor GC" cycles). While short lived objects place a tiny, almost negligible burden, on the application, long lived objects place a big burden, and the worst are those that in between, that is, not collected immediately, but not reachable throughout the lifetime of the application either (meaning, they are used for a little while, say a few seconds or so, and then discarded). The place a huge burden on the GC. 

Now, regular, mutable collections can live for very long, but modifying them usually does not mess with the GC much. If you replace a value in a, say, ConcurrentHashMap, there are no objects created or discarded. But in Clojure, if you have a ref to a map, every assoc causes up to 6 PersistentHashMap nodes to become unreachable, and those may very well be the worst kind of medium-lived objects. OTOH, a high rate of assoc operations may constantly discard the same nodes, o they become short-lived objects and not much of a problem at all. So my question is, what is the "behavior profile" for persistent collection nodes in Clojure applications, and what is their measured effect on GC. Whatever the answer is, it's clear that their behavior from a GC perspective is very different from that of mutable collections.

Rob Lally

unread,
Feb 12, 2012, 3:33:37 PM2/12/12
to clo...@googlegroups.com
I suspect that the clojure is actually slightly better here than standard java collections.

With standard java collections, you'll tend to associate a collection with some attribute or property of an object and then modify it over time. The chances of this object living to a later generation are proportional to the use case. It doesn't matter how you modify it, the root object is probably still the same.

With a clojure persistent collection, you might well have a reference that conceptually represents the same collection in memory. Over time this mutable reference to an immutable collection will change. The thing it changes to, may share some of the internal state of the previous version of the collection associated with that reference. But it might not. When it doesn't then no part of the original collection will remain and so there's less chance of an object living long enough to make it to a later generation. Additionally, as I  understand it, structural sharing only happens when it is efficient to do so: the internal unit of sharing for vectors is something like an 8 object structure. So creating modified versions of a clojure collection will not necessarily create objects with extended lifetimes.

The only case where clojure's persistent collections might produce more objects that survive to later generations is when you create lots of derivative collections that happen to be similar enough to share structure. Java could be better here if the code pattern was to create collections and then create lots of derived collections by creating copies of the collections contents and populating new collections with them. But that really isn't the Java way either.


Of course, I could be wrong...


Rob.



On 12 Feb 2012, at 13:44, pron wrote:

Alright, let me explain. I'm not talking about memory leaks or whether or not objects become eligible for collection when they should. The JVM garbage collector is a generational collector, which means there's a humongous difference in GC work between objects that become unreachable (eligible for GC) shortly after creation, and those that become unreachable after a while (specifically, after surviving a couple of "minor GC" cycles). While short lived objects place a tiny, almost negligible burden, on the application, long lived objects place a big burden, and the worst are those that in between, that is, not collected immediately, but not reachable throughout the lifetime of the application either (meaning, they are used for a little while, say a few seconds or so, and then discarded). The place a huge burden on the GC. 

Now, regular, mutable collections can live for very long, but modifying them usually does not mess with the GC much. If you replace a value in a, say, ConcurrentHashMap, there are no objects created or discarded. But in Clojure, if you have a ref to a map, every assoc causes up to 6 PersistentHashMap nodes to become unreachable, and those may very well be the worst kind of medium-lived objects. OTOH, a high rate of assoc operations may constantly discard the same nodes, o they become short-lived objects and not much of a problem at all. So my question is, what is the "behavior profile" for persistent collection nodes in Clojure applications, and what is their measured effect on GC. Whatever the answer is, it's clear that their behavior from a GC perspective is very different from that of mutable collections.

Sean Corfield

unread,
Feb 12, 2012, 5:01:18 PM2/12/12
to clo...@googlegroups.com
On Sun, Feb 12, 2012 at 5:44 AM, pron <ron.pr...@gmail.com> wrote:
> So my question is, what is the
> "behavior profile" for persistent collection nodes in Clojure applications,
> and what is their measured effect on GC.

I think that's going to depend on what your code does and how it
behaves - which is the same answer pretty much regardless of the
language used: you profile your app under load and you tune your JVM
settings, including GC behavior. I suspect some Clojure apps will
behave poorly with default settings, just like some Java apps do. It's
also a truism that an application written in Java will likely have a
very different JVM performance profile than "the same" app written in
Clojure (or any other language sufficiently different from Java)
because they're not really "the same" at all.

In other words, I agree with your conclusion that a purely functional
application based on persistent data structures will have a different
GC profile to a purely imperative application based on mutable data
structures. That doesn't imply Clojure will be worse (or better) than
the Java "equivalent", merely different.

The Scala community has also dealt with this difference because they
have persistent data structures and knowledge is shared back and forth
between the Clojure and Scala implementors. They've done a lot of work
ensuring persistent data structures perform well on the JVM. Check out
Phil Bagwell's talk from Clojure/conj for more detailed information:

http://blip.tv/clojure/phill-bagwell-striving-to-make-things-simple-and-fast-5936145

pron

unread,
Feb 13, 2012, 4:50:16 PM2/13/12
to clo...@googlegroups.com
I watched Phil Bagwell's talk and found it very interesting, but I as far as I remember he doesn't discuss GC.
Anyway, let's leave this as an "open question", and I'd be interested in hearing from people who've memory-profiled their persistent collections. But I can understand from your answer that there are no serious problems when it comes to GC in Clojure apps, which is very good to know.

Reply all
Reply to author
Forward
0 new messages