Hi. I have a question:I love Clojure's persistent collections, but don't they generate many long-lived objects (that go to the surving generations) that are hard to garbage-collect? After all, the discarded nodes after "modification" are not necessarily short lived. It seems like they would behave badly from the GC perspective. Am I wrong?
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
These are the options we run with in production - Clojure is loaded at
runtime into a Tomcat application:
-Dsun.rmi.dgc.client.gcInterval=600000
-Dsun.rmi.dgc.server.gcInterval=600000
-XX:+ExplicitGCInvokesConcurrent -XX:+UseConcMarkSweepGC
That's the usual RMI GC fix, and selection of CMS GC which seemed to
suit our memory usage better.
Part of the middleware we use explicitly requests GC approximately
every minute. Disabling those explicit GC calls led to long
stop-the-world GC cycles often enough to have a noticeable effect on
the end user experience so we set explicit calls to use CMS. This had
two benefits for us: consistent performance (the CMS GC bails if takes
too long so our request processing times stay fairly consistent) and
keeping memory usage to reasonable levels (which means we only get
occasional stop-the-world GC cycles and only under extreme bursts of
load). We have min/max heap set to 2.5GB. We may revisit this as we
ramp up traffic but for now this is working pretty well.
The amount of time per request spent in Clojure code varies between 5%
and 80% (depending on the type of request - some logic hardly hits the
Clojure code, some logic is implemented almost entirely in Clojure -
we expect the balance to continue to shift toward Clojure over time).
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/
"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)
I've profiled a 20k line Clojure application and have never seen anything like what you seem to be suggesting. It seems (though you've not been very clear) that you suspect that persistent collections may be holding references to nodes longer than necessary. That is, longer than a mutable collection would. My reading of the implementation suggests that this is not the case. Do you have any specific evidence or is this just a hunch? Any 'modifications' to a persistent collection return a new collection which may or may not share structure with the previous one. If you don't hold a reference to the previous version the collection and the unshared nodes become eligible for garbage collection. Am I missing something here?
Paudi
Alright, let me explain. I'm not talking about memory leaks or whether or not objects become eligible for collection when they should. The JVM garbage collector is a generational collector, which means there's a humongous difference in GC work between objects that become unreachable (eligible for GC) shortly after creation, and those that become unreachable after a while (specifically, after surviving a couple of "minor GC" cycles). While short lived objects place a tiny, almost negligible burden, on the application, long lived objects place a big burden, and the worst are those that in between, that is, not collected immediately, but not reachable throughout the lifetime of the application either (meaning, they are used for a little while, say a few seconds or so, and then discarded). The place a huge burden on the GC.Now, regular, mutable collections can live for very long, but modifying them usually does not mess with the GC much. If you replace a value in a, say, ConcurrentHashMap, there are no objects created or discarded. But in Clojure, if you have a ref to a map, every assoc causes up to 6 PersistentHashMap nodes to become unreachable, and those may very well be the worst kind of medium-lived objects. OTOH, a high rate of assoc operations may constantly discard the same nodes, o they become short-lived objects and not much of a problem at all. So my question is, what is the "behavior profile" for persistent collection nodes in Clojure applications, and what is their measured effect on GC. Whatever the answer is, it's clear that their behavior from a GC perspective is very different from that of mutable collections.
I think that's going to depend on what your code does and how it
behaves - which is the same answer pretty much regardless of the
language used: you profile your app under load and you tune your JVM
settings, including GC behavior. I suspect some Clojure apps will
behave poorly with default settings, just like some Java apps do. It's
also a truism that an application written in Java will likely have a
very different JVM performance profile than "the same" app written in
Clojure (or any other language sufficiently different from Java)
because they're not really "the same" at all.
In other words, I agree with your conclusion that a purely functional
application based on persistent data structures will have a different
GC profile to a purely imperative application based on mutable data
structures. That doesn't imply Clojure will be worse (or better) than
the Java "equivalent", merely different.
The Scala community has also dealt with this difference because they
have persistent data structures and knowledge is shared back and forth
between the Clojure and Scala implementors. They've done a lot of work
ensuring persistent data structures perform well on the JVM. Check out
Phil Bagwell's talk from Clojure/conj for more detailed information:
http://blip.tv/clojure/phill-bagwell-striving-to-make-things-simple-and-fast-5936145