When a Clojure application is used within a servlet container
(Jetty/Tomcat/Immutant/etc.), it leaks memory on undeployment or
redeployment. The amount of memory leaked varies, and is a sum of the
sizes of all of the classes loaded via the deployment's classloader
and its children. This is caused by something holding a hard
reference to one or more objects, which through [the object -> the
object's class -> the class' classloader -> all classes loaded by that
classloader] prevents all of the application's classes from being
GC'ed. This eventually causes the container to run out of permgen. The
current workaround is to restart the container on every undeploy or
redeploy, which negatively impacts any other applications in the same
container.
I've yet to open an issue for this, I wanted to start a discussion
here first. I chatted with Rich about it briefly at the Conj, and he
urged me to start a dialog.
# What holds the references?
In this case, clojure.lang.Var$Frame and
clojure.lang.LockingTransaction objects are being held by two
ThreadLocals: Var.dvals and LockingTransaction.transaction,
respectively. These two ThreadLocals aren't properly removed when they
are no longer needed.
Var.dvals is used to store the stack frames for each call.
LockingTransaction.transaction stores the currently active transaction
for STM, and is only active within an STM call.
# Why is this only seen in containers?
Servlet containers manage their own thread pools, and use these thread
pools across all applications within the container, with a lifecycle
that is bound to the JVM process, not to any of the applications. So
when an application is removed from (or redeployed to) the container,
any ThreadLocals it may have set on any of those threads will cause
the full classloader tree from the prior deployment to be retained.
When used outside of a shared container environment (i.e. only one
clojure runtime per JVM, which is intended to go away when the
JVM exits), the JVM doesn't continue to run after the runtime is
removed, so there is no opportunity for the memory leak.
# A patch for all seasons
I have a patch that appears in my testing to correctly remove the
dvals and transaction ThreadLocals when they are no longer needed (see
below). I've tested it in both Jetty and Immutant, and confirmed that
it eliminates the memory leakage. For context, you can see the code
with the patches applied at:
https://github.com/tobias/clojure/blob/threadlocal-removal/src/jvm/clojure/lang/Var.java#L361
https://github.com/tobias/clojure/blob/threadlocal-removal/src/jvm/clojure/lang/LockingTransaction.java#L223
diff --git a/src/jvm/clojure/lang/LockingTransaction.java b/src/jvm/clojure/lang/LockingTransaction.java
index 44d2de6..48aea1f 100644
--- a/src/jvm/clojure/lang/LockingTransaction.java
+++ b/src/jvm/clojure/lang/LockingTransaction.java
@@ -222,13 +222,21 @@ static LockingTransaction getRunning(){
static public Object runInTransaction(Callable fn) throws Exception{
LockingTransaction t = transaction.get();
- if(t == null)
+ boolean outer = false;
+ if(t == null) {
transaction.set(t = new LockingTransaction());
-
- if(
t.info != null)
- return fn.call();
-
- return t.run(fn);
+ outer = true;
+ }
+
+ if(
t.info != null)
+ return fn.call();
+
+ try {
+ return t.run(fn);
+ } finally {
+ if (outer)
+ transaction.remove();
+ }
}
static class Notify{
diff --git a/src/jvm/clojure/lang/Var.java b/src/jvm/clojure/lang/Var.java
index 1630889..96f68cd 100644
--- a/src/jvm/clojure/lang/Var.java
+++ b/src/jvm/clojure/lang/Var.java
@@ -362,7 +362,13 @@ public static void popThreadBindings(){
Frame f = dvals.get();
if(f.prev == null)
throw new IllegalStateException("Pop without matching push");
- dvals.set(f.prev);
+ f = f.prev;
+ if (f.prev == null) {
+ dvals.remove();
+ } else {
+ dvals.set(f);
+ }
+
}
public static Associative getThreadBindings(){
# Performance implications
Since the Var change adds logic to *every* function call, I'm
concerned that there may be performance implications. The patch
introduces an additional null check to every call, and a
ThreadLocal#remove() call when the call stack is empty (which occurs
on exit from any code called on another thread (agents, etc), or when
returning control to a caller that is outside of the clojure runtime).
I benchmarked the following simple code using Criterium:
(defn f []
(str :a :b :c))
(defn -main
[& args]
(criterium.core/bench
(dosync (f))))
Without the patch applied (a local build of 1.5.0-master-SNAPSHOT as
of today), I get:
Evaluation count : 13812420
Execution time mean : 4.359034 us 95.0% CI: (4.358805 us, 4.359302 us)
Execution time std-deviation : 25.694606 ns 95.0% CI: (25.395340 ns, 26.057529 ns)
Execution time lower ci : 4.332653 us 95.0% CI: (4.332653 us, 4.334702 us)
Execution time upper ci : 4.378642 us 95.0% CI: (4.378642 us, 4.380838 us)
With just the dvals patch applied, I get:
Evaluation count : 13902600
Execution time mean : 4.348634 us 95.0% CI: (4.348488 us, 4.348726 us)
Execution time std-deviation : 17.124359 ns 95.0% CI: (17.022510 ns, 17.224815 ns)
Execution time lower ci : 4.325087 us 95.0% CI: (4.325033 us, 4.325087 us)
Execution time upper ci : 4.380438 us 95.0% CI: (4.380212 us, 4.380438 us)
Which appears to be the same as the pre-patch performance. This is
what I suspected, since the only real change is the extra null check
in each call - the remove only occurs when -main exits.
With both patches applied, I get:
Evaluation count : 7576740
Execution time mean : 8.229411 us 95.0% CI: (8.228566 us, 8.230451 us)
Execution time std-deviation : 132.444024 ns 95.0% CI: (131.100606 ns, 133.570078 ns)
Execution time lower ci : 8.036427 us 95.0% CI: (8.036143 us, 8.036427 us)
Execution time upper ci : 8.417164 us 95.0% CI: (8.417164 us, 8.418130 us)
Which leads me to believe the cost of the remove plus the try/finally
adds ~4 us to each dosync invocation.
# Are there other places where this could be a problem?
I'm glad you asked! Clojure currently uses ThreadLocals in two other
places:
* clojure.instant uses one to store a SimpleDateFormat object - this
won't cause a leak since SimpleDateFormat is loaded by the system
classloader.
* clojure.lang.Agent uses one to store nested agent calls - from
reading the code and local testing, the nested ThreadLocal appears
to be properly removed.
# Open questions
There are a few questions that need to be answered before this patch
should be applied, namely:
* Does anyone know of any consequences of this change that I have
missed?
* Is the above an adequate benchmark?
* Is there a better way to solve this problem?
* Is it too late to get this fixed in 1.5?
* Should any of this be configurable (perhaps via a dynamic var)? I'm
concerned that the cost to check that var may be as high or higher
than the check/try/remove code, and will add complexity.
* Is the ~4 us overhead for STM an issue? If you are using STM, you're
implicitly accepting the time cost of eventual consistency in the
case of a collision. I suspect that a single retry takes longer
than 4 us, given the use of exceptions for control (though I
haven't verified that).
---
Toby Crawley
http://immutant.org |
http://torquebox.org