Clojure can leak memory when used in a servlet container

604 views
Skip to first unread message

Toby Crawley

unread,
Nov 19, 2012, 4:32:24 PM11/19/12
to cloju...@googlegroups.com
When a Clojure application is used within a servlet container
(Jetty/Tomcat/Immutant/etc.), it leaks memory on undeployment or
redeployment. The amount of memory leaked varies, and is a sum of the
sizes of all of the classes loaded via the deployment's classloader
and its children. This is caused by something holding a hard
reference to one or more objects, which through [the object -> the
object's class -> the class' classloader -> all classes loaded by that
classloader] prevents all of the application's classes from being
GC'ed. This eventually causes the container to run out of permgen. The
current workaround is to restart the container on every undeploy or
redeploy, which negatively impacts any other applications in the same
container.

I've yet to open an issue for this, I wanted to start a discussion
here first. I chatted with Rich about it briefly at the Conj, and he
urged me to start a dialog.

# What holds the references?

In this case, clojure.lang.Var$Frame and
clojure.lang.LockingTransaction objects are being held by two
ThreadLocals: Var.dvals and LockingTransaction.transaction,
respectively. These two ThreadLocals aren't properly removed when they
are no longer needed.

Var.dvals is used to store the stack frames for each call.
LockingTransaction.transaction stores the currently active transaction
for STM, and is only active within an STM call.

# Why is this only seen in containers?

Servlet containers manage their own thread pools, and use these thread
pools across all applications within the container, with a lifecycle
that is bound to the JVM process, not to any of the applications. So
when an application is removed from (or redeployed to) the container,
any ThreadLocals it may have set on any of those threads will cause
the full classloader tree from the prior deployment to be retained.

When used outside of a shared container environment (i.e. only one
clojure runtime per JVM, which is intended to go away when the
JVM exits), the JVM doesn't continue to run after the runtime is
removed, so there is no opportunity for the memory leak.

# A patch for all seasons

I have a patch that appears in my testing to correctly remove the
dvals and transaction ThreadLocals when they are no longer needed (see
below). I've tested it in both Jetty and Immutant, and confirmed that
it eliminates the memory leakage. For context, you can see the code
with the patches applied at:
https://github.com/tobias/clojure/blob/threadlocal-removal/src/jvm/clojure/lang/Var.java#L361
https://github.com/tobias/clojure/blob/threadlocal-removal/src/jvm/clojure/lang/LockingTransaction.java#L223

diff --git a/src/jvm/clojure/lang/LockingTransaction.java b/src/jvm/clojure/lang/LockingTransaction.java
index 44d2de6..48aea1f 100644
--- a/src/jvm/clojure/lang/LockingTransaction.java
+++ b/src/jvm/clojure/lang/LockingTransaction.java
@@ -222,13 +222,21 @@ static LockingTransaction getRunning(){

static public Object runInTransaction(Callable fn) throws Exception{
LockingTransaction t = transaction.get();
- if(t == null)
+ boolean outer = false;
+ if(t == null) {
transaction.set(t = new LockingTransaction());
-
- if(t.info != null)
- return fn.call();
-
- return t.run(fn);
+ outer = true;
+ }
+
+ if(t.info != null)
+ return fn.call();
+
+ try {
+ return t.run(fn);
+ } finally {
+ if (outer)
+ transaction.remove();
+ }
}

static class Notify{
diff --git a/src/jvm/clojure/lang/Var.java b/src/jvm/clojure/lang/Var.java
index 1630889..96f68cd 100644
--- a/src/jvm/clojure/lang/Var.java
+++ b/src/jvm/clojure/lang/Var.java
@@ -362,7 +362,13 @@ public static void popThreadBindings(){
Frame f = dvals.get();
if(f.prev == null)
throw new IllegalStateException("Pop without matching push");
- dvals.set(f.prev);
+ f = f.prev;
+ if (f.prev == null) {
+ dvals.remove();
+ } else {
+ dvals.set(f);
+ }
+
}

public static Associative getThreadBindings(){


# Performance implications

Since the Var change adds logic to *every* function call, I'm
concerned that there may be performance implications. The patch
introduces an additional null check to every call, and a
ThreadLocal#remove() call when the call stack is empty (which occurs
on exit from any code called on another thread (agents, etc), or when
returning control to a caller that is outside of the clojure runtime).

I benchmarked the following simple code using Criterium:

(defn f []
(str :a :b :c))

(defn -main
[& args]
(criterium.core/bench
(dosync (f))))

Without the patch applied (a local build of 1.5.0-master-SNAPSHOT as
of today), I get:

Evaluation count : 13812420
Execution time mean : 4.359034 us 95.0% CI: (4.358805 us, 4.359302 us)
Execution time std-deviation : 25.694606 ns 95.0% CI: (25.395340 ns, 26.057529 ns)
Execution time lower ci : 4.332653 us 95.0% CI: (4.332653 us, 4.334702 us)
Execution time upper ci : 4.378642 us 95.0% CI: (4.378642 us, 4.380838 us)

With just the dvals patch applied, I get:

Evaluation count : 13902600
Execution time mean : 4.348634 us 95.0% CI: (4.348488 us, 4.348726 us)
Execution time std-deviation : 17.124359 ns 95.0% CI: (17.022510 ns, 17.224815 ns)
Execution time lower ci : 4.325087 us 95.0% CI: (4.325033 us, 4.325087 us)
Execution time upper ci : 4.380438 us 95.0% CI: (4.380212 us, 4.380438 us)

Which appears to be the same as the pre-patch performance. This is
what I suspected, since the only real change is the extra null check
in each call - the remove only occurs when -main exits.

With both patches applied, I get:

Evaluation count : 7576740
Execution time mean : 8.229411 us 95.0% CI: (8.228566 us, 8.230451 us)
Execution time std-deviation : 132.444024 ns 95.0% CI: (131.100606 ns, 133.570078 ns)
Execution time lower ci : 8.036427 us 95.0% CI: (8.036143 us, 8.036427 us)
Execution time upper ci : 8.417164 us 95.0% CI: (8.417164 us, 8.418130 us)

Which leads me to believe the cost of the remove plus the try/finally
adds ~4 us to each dosync invocation.

# Are there other places where this could be a problem?

I'm glad you asked! Clojure currently uses ThreadLocals in two other
places:

* clojure.instant uses one to store a SimpleDateFormat object - this
won't cause a leak since SimpleDateFormat is loaded by the system
classloader.

* clojure.lang.Agent uses one to store nested agent calls - from
reading the code and local testing, the nested ThreadLocal appears
to be properly removed.

# Open questions

There are a few questions that need to be answered before this patch
should be applied, namely:

* Does anyone know of any consequences of this change that I have
missed?

* Is the above an adequate benchmark?

* Is there a better way to solve this problem?

* Is it too late to get this fixed in 1.5?

* Should any of this be configurable (perhaps via a dynamic var)? I'm
concerned that the cost to check that var may be as high or higher
than the check/try/remove code, and will add complexity.

* Is the ~4 us overhead for STM an issue? If you are using STM, you're
implicitly accepting the time cost of eventual consistency in the
case of a collision. I suspect that a single retry takes longer
than 4 us, given the use of exceptions for control (though I
haven't verified that).

---
Toby Crawley
http://immutant.org | http://torquebox.org




Paul Stadig

unread,
Nov 19, 2012, 10:24:13 PM11/19/12
to cloju...@googlegroups.com
On Mon, Nov 19, 2012 at 4:32 PM, Toby Crawley <to...@tcrawley.org> wrote:
> # What holds the references?
>
> In this case, clojure.lang.Var$Frame and
> clojure.lang.LockingTransaction objects are being held by two
> ThreadLocals: Var.dvals and LockingTransaction.transaction,
> respectively. These two ThreadLocals aren't properly removed when they
> are no longer needed.
>
> Var.dvals is used to store the stack frames for each call.

Var.dvals is actually only used to store dynamic bindings (either
through use of the bindings macro or a direct use of the
push-thread-bindings function), so it is not as bad as you may think.
Changes to Var.popThreadBindings should not cause an impact for every
function invocation through a var. With the dynamic/static var changes
in 1.3+ most code doesn't bother to even check dvals, but just calls
straight into the var's root.

> # A patch for all seasons
>
> I have a patch that appears in my testing to correctly remove the
> dvals and transaction ThreadLocals when they are no longer needed (see
> below). I've tested it in both Jetty and Immutant, and confirmed that
> it eliminates the memory leakage.

Based on my understanding of the code, I don't think this change would
eliminate the memory leak. Any code that calls dvals.get() would
populate the ThreadLocal with an instance of Var$Frame, and isBound
calls dvals.get, so I think any code that touches a dynamic var would
cause the memory leak. If you had a test case that did something like

(def ^:dynamic *foo* 1)
*foo*

then I think the container would not GC the classloader. What test
case did you use to verify the memory leak was gone?

I think ultimately what you'd probably have to do is have the dvals
ThreadLocal initialize to null instead of an instance to Var$Frame,
and any time you handle a dynamic var consider a ThreadLocal with a
bound value of null to mean unbound. That *would* affect the
performance of accessing a dynamic var (including calling a fun
through a var though I suspect that isn't a very common case). I'm not
sure how much of a performance hit that might be, perhaps it would be
negligible?

The LockingTransaction changes look fine to me, since
LockingTransaction uses null as an unbound sentinel value.


Paul

Toby Crawley

unread,
Nov 20, 2012, 12:36:26 PM11/20/12
to cloju...@googlegroups.com
Paul:

Thanks for the feedback.

On Nov 19, 2012, at 10:24 PM, Paul Stadig <pa...@stadig.name> wrote:
> Var.dvals is actually only used to store dynamic bindings (either
> through use of the bindings macro or a direct use of the
> push-thread-bindings function), so it is not as bad as you may think.
> Changes to Var.popThreadBindings should not cause an impact for every
> function invocation through a var. With the dynamic/static var changes
> in 1.3+ most code doesn't bother to even check dvals, but just calls
> straight into the var's root.

Thanks for the correction. Without the foolish assumption that dvals is call stack frames, the code is so much clearer :)

> Based on my understanding of the code, I don't think this change would
> eliminate the memory leak. Any code that calls dvals.get() would
> populate the ThreadLocal with an instance of Var$Frame, and isBound
> calls dvals.get, so I think any code that touches a dynamic var would
> cause the memory leak. If you had a test case that did something like
>
> (def ^:dynamic *foo* 1)
> *foo*

isBound first checks to see if the Var is threadbound, which only gets set in Var.pushThreadBindings. All of the pushThreadBindings call appear to have a corresponding popThreadBindings call, which is where I call dvals.remove. So I believe this case is safe.

>
> then I think the container would not GC the classloader. What test
> case did you use to verify the memory leak was gone?

My test is to redeploy and access (via http) a simple ring app (that now contains an unbound dynamic var access as you suggested above) in a loop with a sleep between each redeploy while I monitor permgen and the unloaded class count via VisualVM. Without the patch, I exhaust a permgen of 82MB within 10 deployments, with a class total loaded/unloaded ratio of 17k/80. With the patch, I can run the same loop 100 times without exhausting permgen, with a final class total ratio of 13k/121k. And GC reaches a steady state of growth/shrinkage with each deployment w/o continuing to grow.

However, it's not all roses. Given your pointer about dvals.get usage above, I audited its usage, and found three more cases which will cause a leak, even with the patch applied:

* calling Var.cloneThreadBindingFrame (via core/binding-conveyor-fn, used by core/send-via & core/future-call)
* calling Var.getThreadBindings (via core/get-thread-bindings)
* calling Var.getThreadBindingFrame, which is not actually used anywhere in the clojure codebase, but is public

I'm going to take a look at resolving those. I believe the solution is based off of what you said below - dvals should probably not have an initialValue method, and anywhere we call get should then check for null.

I'll report back.

> I think ultimately what you'd probably have to do is have the dvals
> ThreadLocal initialize to null instead of an instance to Var$Frame,
> and any time you handle a dynamic var consider a ThreadLocal with a
> bound value of null to mean unbound. That *would* affect the
> performance of accessing a dynamic var (including calling a fun
> through a var though I suspect that isn't a very common case). I'm not
> sure how much of a performance hit that might be, perhaps it would be
> negligible?
>

Toby Crawley

unread,
Dec 10, 2012, 1:00:50 PM12/10/12
to cloju...@googlegroups.com
I took another look at the thread leaks caused by Var.dvals, and added a
few more changes to the patch, namely:

* dvals no longer has an initialValue method. With one, any call to
dvals.get() would cause it to be set to an empty Var$Frame, even if
it wasn't used. And if that happened outside the auspices of a
pushThreadBindings/popThreadBindings pair, it could leak. Since
dvals no longer has a default value, we now have to check for null
on any dvals.get call. getThreadBindingFrame does this already, so I
replaced most of the dvals.get calls with calls to
getThreadBindingFrame.

* Var.getThreadBindingFrame now has a return value of Var$Frame
instead of Object. It's not used anywhere I can find within the
Clojure codebase, and I don't think this change should cause any
issues with any third-party consumers.

These fixes eliminate all the memory leaks that were generated by my
test code (which was derived from the benchmark code below), except in
the case where agents or futures are used, and the executors aren't
shut down, either via (shutdown-agents) or an explicit .shutdown call
on the executors themselves.

This is due to how binding-conveyor-fn sets the bindings when ran - it
replaces them in the executor thread, and has no mechanism to remove
them. Removing them is complicated, because it is hard to know when
the thread is finished with them, since :error-handler fns need the
bindings to be set.

I'm willing to accept leaving dvals set on agent threads, since the
default executors used by clojure are per-runtime (and therefore per
app when used in an isolated container), and those executors should be
shutdown on undeploy. And if the app doesn't shutdown the executors on
undeploy (or the container doesn't do it), then you have a thread leak
as well, so it's important that any apps running in a container do so
on undeploy. Immutant already calls (shutdown-agents) on behalf of
each app on undeploy, but uberwar'ed apps running under
Tomcat/Jetty/etc would likely need to implement something to insure
that the executors are shutdown.

# Performance

The performance impact of this latest patch is basically the same as
that of the prior version. The benchmark code now does more to
exercise more of the points touched by the patch, so takes longer
overall. The benchmark:

(defn f []
(str :a :b :c))

(def ^:dynamic *foo* 1)

(defn -main
[& args]
(c/bench
(do
*foo*
@(future-call (constantly 1))
(get-thread-bindings)
(dosync
(f))))
(shutdown-agents))

The results:

stock 1.5.0, w/dosync:

Evaluation count : 3465780
Execution time mean : 18.242435 us 95.0% CI: (18.234051 us, 18.252651 us)
Execution time std-deviation : 1.155943 us 95.0% CI: (1.143688 us, 1.173428 us)
Execution time lower ci : 17.261465 us 95.0% CI: (17.261465 us, 17.261465 us)
Execution time upper ci : 20.384970 us 95.0% CI: (20.384970 us, 20.384970 us)

Found 6 outliers in 60 samples (10.0000 %)
low-severe 4 (6.6667 %)
low-mild 2 (3.3333 %)
Variance from outliers : 46.8182 % Variance is moderately inflated by outliers

w/o dosync (commented out of the code above):

Evaluation count : 3407820
Execution time mean : 18.016824 us 95.0% CI: (18.011655 us, 18.024937 us)
Execution time std-deviation : 854.666494 ns 95.0% CI: (848.952064 ns, 863.049382 ns)
Execution time lower ci : 17.330211 us 95.0% CI: (17.330211 us, 17.330211 us)
Execution time upper ci : 19.872663 us 95.0% CI: (19.866577 us, 19.872663 us)

Found 7 outliers in 60 samples (11.6667 %)
low-severe 6 (10.0000 %)
low-mild 1 (1.6667 %)
Variance from outliers : 33.5824 % Variance is moderately inflated by outliers

with the patch applied, no dosync in the benchmark:

Evaluation count : 3182040
Execution time mean : 18.852075 us 95.0% CI: (18.848215 us, 18.857049 us)
Execution time std-deviation : 467.356898 ns 95.0% CI: (464.327537 ns, 470.686106 ns)
Execution time lower ci : 18.084154 us 95.0% CI: (18.084154 us, 18.084154 us)
Execution time upper ci : 19.538687 us 95.0% CI: (19.537033 us, 19.538687 us)

Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 12.5800 % Variance is moderately inflated by outliers

with the patch applied, and dosync in the benchmark:

Compiling tl-test.core
Evaluation count : 2458440
Execution time mean : 24.681252 us 95.0% CI: (24.679057 us, 24.684407 us)
Execution time std-deviation : 295.450106 ns 95.0% CI: (292.978106 ns, 298.009002 ns)
Execution time lower ci : 24.282472 us 95.0% CI: (24.282472 us, 24.282472 us)
Execution time upper ci : 25.141700 us 95.0% CI: (25.141700 us, 25.145132 us)

Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers

# The full patch

You can see the full patch at https://gist.github.com/0618b31127ef289214f6,
and can see the patch applied at
https://github.com/tobias/clojure/blob/threadlocal-removal/src/jvm/clojure/lang/Var.java
and
https://github.com/tobias/clojure/blob/threadlocal-removal/src/jvm/clojure/lang/LockingTransaction.java
if you need to see it in context.

# What next?

I'd like to open a jira for this issue. Any objections?
> --
> You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.

Rich Hickey

unread,
Dec 11, 2012, 1:17:32 PM12/11/12
to cloju...@googlegroups.com
That would be great, thanks!

Rich

Toby Crawley

unread,
Dec 11, 2012, 4:03:36 PM12/11/12
to cloju...@googlegroups.com

On Dec 11, 2012, at 1:17 PM, Rich Hickey <richh...@gmail.com> wrote:

>> I'd like to open a jira for this issue. Any objections?
>
> That would be great, thanks!

Done: http://dev.clojure.org/jira/browse/CLJ-1125
Reply all
Reply to author
Forward
0 new messages