CodeNarc source of PermGen leak?

75 views
Skip to first unread message

Cedric Champeau

unread,
Aug 24, 2015, 3:37:57 AM8/24/15
to gradl...@googlegroups.com, blac...@gmx.org
Hi,

Sterling and I have been investigating the intermittent PermGen space errors that occur to the build. It seemed to became much worse with the upgrade to Groovy 2.4.4 on master. I "fixed" the problem by increasing the permgen space for groovy compile tasks, and locally disabled the fix to take some heap dumps, using:

gradle/compile.groovy:
tasks.withType(GroovyCompile) {
    options.forkOptions.jvmArgs = ['-XX:+HeapDumpOnOutOfMemoryError','-XX:HeapDumpPath=/tmp/dump']
}

gradle/integTest.groovy
integTestTasks.all {
    jvmArgs '-Xmx512m', '-XX:MaxPermSize=256m', '-XX:+HeapDumpOnOutOfMemoryError', '-XX:HeapDumpPath=/tmp/dump'
}

And I run:

JAVA_HOME=/opt/jdk1.7.0_75 gw clean intTest

It will systematically fail with an OOM, and what the dump shows is a lot of org.codehaus.groovy.util.ReferenceType$SoftRef classes, which should normally be released when their referenced class is freed.

However, analyzing the heap dump, I found this:

Images intégrées 1

As you can see, the class which is still loaded is CodeNarcTask. I wonder why we have it although CodeNarc checks are not yet executed. Whatever random Ref I choose, through a chain of "next" member, I will end up to this CodeNarcTask class, which prevents unloading. I also fail to see why the situation is worse with 2.4.4. To be clear: this already happened with 2.3.10 (Sterling confirmed he could reproduce before the upgrade to 2.4.4), but it's now happening much often, but more importantly, it systematically happens if we don't tweak the PermGen VM parameters.

Here is what Sterling found:

"To be clear, this was happening with Groovy 2.3.10, too.  I could make it OOM by running this test ~20 times (the test runs CodeNarc):

It's still happening with 2.4.4, but not any sooner (after ~20 runs), so I'm not sure if 2.4.4 is worse for this particular case.

One suspicious thing I found is in the IsolatedAntBuilder:

We need an isolated classloader with each IsolatedAntBuilder.withClasspath() when executing all of the code quality plugins (CodeNarc, PMD, etc), but we're also caching that classloader.  I don't see where this is ever cleared?  Naively creating the classloader each time didn't seem to prevent the OOM, but maybe there's more to it."

The fact that it doesn't happen sooner with 2.4.4 is maybe related to the changes I made to increase the default PermGen space. However, without those changes, there were lots of failing builds on CI. Also one thing which is important is that the failures can happen during *compilation* of Groovy classes. An example of such a failing build can be found here: 


Honestly I'm not sure what happens here, has anyone seen something like this before?

Any idea would be greatly appreciated, especially because it seems that this problem could be the cause for the performance regressions in master (I have seen a performance build failing with surprisingly bad numbers).

--
Cédric Champeau
Principal Engineer
Gradle, Inc.

Jochen Theodorou

unread,
Aug 24, 2015, 4:52:32 AM8/24/15
to gradl...@googlegroups.com
Am 24.08.2015 09:37, schrieb Cedric Champeau:
> Hi,
>
> Sterling and I have been investigating the intermittent PermGen space
> errors that occur to the build. It seemed to became much worse with the
> upgrade to Groovy 2.4.4 on master.

then I would suggest doing a bisect to find the version that is causing
the problem.


[...]
> It will systematically fail with an OOM, and what the dump shows is a
> lot of org.codehaus.groovy.util.ReferenceType$SoftRef classes, which
> should normally be released when their referenced class is freed.

The managing of the reference is unchanged since late 1.6 I think, but
it is done like this: for the first 500 references (see
ReferenceManager:155) we don't poll the reference queue at all. That
means even if the referenced Object is already garbage, the SoftRef will
continue to exist. The next time a SoftRef is created we check the queue
for garbage object and clean the SoftRef. After that the SoftRef itself
is garbage and should be collected via normal garbage collection - next
time that is.

So it is necessarily strange to have many SoftRef instances.

> However, analyzing the heap dump, I found this:
>
> Images intégrées 1
>
> As you can see, the class which is still loaded is CodeNarcTask. I
> wonder why we have it although CodeNarc checks are not yet executed.

checks are not executed, but there might be extension methods, it might
be that something was called on the class, for example to inspect it.
there are many possibilities. You would have to set a break point in the
ClassInfo/MetaClass generation to see why

> Whatever random Ref I choose, through a chain of "next" member, I will
> end up to this CodeNarcTask class, which prevents unloading.

why does the CodeNarcTask prevent unloading?

> I also fail
> to see why the situation is worse with 2.4.4. To be clear: this already
> happened with 2.3.10 (Sterling confirmed he could reproduce before the
> upgrade to 2.4.4), but it's now happening much often, but more
> importantly, it systematically happens if we don't tweak the PermGen VM
> parameters.

I think you will have to bisect for this one. If you have a fragile
balance, then even an seemingly unrelated change can make everything
fall apart.

> Here is what Sterling found:
>
> "To be clear, this was happening with Groovy 2.3.10, too. I could make
> it OOM by running this test ~20 times (the test runs CodeNarc):
> https://github.com/gradle/gradle/blob/master/subprojects/reporting/src/integTest/groovy/org/gradle/api/reporting/plugins/BuildDashboardPluginIntegrationTest.groovy#L354-354
>
> It's still happening with 2.4.4, but not any sooner (after ~20 runs), so
> I'm not sure if 2.4.4 is worse for this particular case.
>
> One suspicious thing I found is in the IsolatedAntBuilder:
> https://github.com/gradle/gradle/blob/master/subprojects/core/src/main/groovy/org/gradle/api/internal/project/DefaultIsolatedAntBuilder.groovy#L89-89
>
> We need an isolated classloader with each
> IsolatedAntBuilder.withClasspath() when executing all of the code
> quality plugins (CodeNarc, PMD, etc), but we're also caching that
> classloader. I don't see where this is ever cleared? Naively creating
> the classloader each time didn't seem to prevent the OOM, but maybe
> there's more to it."

creating lots of URLClassLoader referencing jar resources can also lead
to problems, since the files are kept open and their file references are
not always closed. But I have not ventured enough into this to know if
that will keep the classloader itself alive as well. It is advised to
call close() on the loader once your are done with it.

> The fact that it doesn't happen sooner with 2.4.4 is maybe related to
> the changes I made to increase the default PermGen space. However,
> without those changes, there were lots of failing builds on CI. Also one
> thing which is important is that the failures can happen during
> *compilation* of Groovy classes.

The compilation of Groovy classes can also cause the execution of Groovy
classes. Not the ones you are compiling of course, but extensions to the
compiler like transforms.

bye blackdrag

--
Jochen "blackdrag" Theodorou
blog: http://blackdragsview.blogspot.com/

Cedric Champeau

unread,
Aug 24, 2015, 5:30:21 AM8/24/15
to gradl...@googlegroups.com
2015-08-24 10:52 GMT+02:00 Jochen Theodorou <blac...@gmx.org>:
Am 24.08.2015 09:37, schrieb Cedric Champeau:
Hi,

Sterling and I have been investigating the intermittent PermGen space
errors that occur to the build. It seemed to became much worse with the
upgrade to Groovy 2.4.4 on master.

then I would suggest doing a bisect to find the version that is causing the problem.
 
I will try to, but it might not be as easy as changing a version number in the build properties I fear.


[...]
It will systematically fail with an OOM, and what the dump shows is a
lot of org.codehaus.groovy.util.ReferenceType$SoftRef classes, which
should normally be released when their referenced class is freed.

The managing of the reference is unchanged since late 1.6 I think, but it is done like this: for the first 500 references (see ReferenceManager:155)  we don't poll the reference queue at all. That means even if the referenced Object is already garbage, the SoftRef will continue to exist. The next time a SoftRef is created we check the queue for garbage object and clean the SoftRef. After that the SoftRef itself is garbage and should be collected via normal garbage collection - next time that is.

So it is necessarily strange to have many SoftRef instances.

Right. In my dump I have 24875 instances of SoftRef, retaining 64MB of data (which would correspond to the default limit of PermGen space).
 

However, analyzing the heap dump, I found this:

Images intégrées 1

As you can see, the class which is still loaded is CodeNarcTask. I
wonder why we have it although CodeNarc checks are not yet executed.

checks are not executed, but there might be extension methods, it might be that something was called on the class, for example to inspect it. there are many possibilities. You would have to set a break point in the ClassInfo/MetaClass generation to see why
ok 


Whatever random Ref I choose, through a chain of "next" member, I will
end up to this CodeNarcTask class, which prevents unloading.

why does the CodeNarcTask prevent unloading?
I am not sure, I still have to understand what is going on. 
Correct, one case was the SAM type coercion, which triggers loading (but not intialization) of a class, in the Gradle TransformLoader. One possibility is that we reuse the transform loader, but it would be surprising. I have to check that.


bye blackdrag


--
Jochen "blackdrag" Theodorou
blog: http://blackdragsview.blogspot.com/

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gradle-dev/55DADB61.4010201%40gmx.org.
For more options, visit https://groups.google.com/d/optout.

Cedric Champeau

unread,
Aug 24, 2015, 5:35:08 PM8/24/15
to gradl...@googlegroups.com, Jochen Theodorou
So after a new day of debugging, it seems it's really not trivial. I've made a lot of changes to debug, including optimizations to avoid too eagerly loading classes, but none solved the problem. The `IsolatedAntBuilder` seem to be the source of the leak, and `ClassInfo` seem to be the other hand. The Groovy runtime is loaded multiple times (which is exactly what we want, because for different tasks we can have different classpath), but for some reason, it cannot be unloaded. I'm seeing that this `ClassInfo` instance, in Java 7, is retained by a `ClassValue`, which corresponds to a class of the `primitiveTypes` array of `sun.reflect.AccessorGenerator`. This array is a static Class[] array containing Integer.TYPE, ...

Images intégrées 1

So could it be that we set a ClassInfo value for a primitive type, which prevents the ClassInfo from being unloaded, and in turn, prevents the classloader from being unloaded, so the whole runtime... 
Of course, you're going to tell me it can be a JDK bug, let's see what the Java 6 version says. It still fails more or less at the same time with a PermGen space issue, but this time, YourKit is not able to find a path to the GC root, which is strongly reachable. So in that case, it might just mean that JDK 6 is not able to unload the classes (I don't recall if you need an explicit VM flag like +XX:+CMSClassUnloadingEnabled). So yes, it might just well be a JDK bug, but how to be sure?

Tomorrow I'm going to check if I can trace the classes which we load, then force clearing the `ClassValue` instances. I have no idea if it is doable, entering a domain I've not played with yet.

Sterling Greene

unread,
Aug 24, 2015, 7:03:08 PM8/24/15
to gradl...@googlegroups.com
Just noticed this on CodeNarc's GH page: https://github.com/CodeNarc/CodeNarc/issues/116

Not much new information (other than it maybe works with 2.3, which would have been before we turned on class reuse in the daemon).

Jochen Theodorou

unread,
Aug 25, 2015, 3:15:17 AM8/25/15
to gradl...@googlegroups.com
Am 24.08.2015 23:35, schrieb Cedric Champeau:
> So after a new day of debugging, it seems it's really not trivial. I've
> made a lot of changes to debug, including optimizations to avoid too
> eagerly loading classes, but none solved the problem. The
> `IsolatedAntBuilder` seem to be the source of the leak, and `ClassInfo`
> seem to be the other hand. The Groovy runtime is loaded multiple times
> (which is exactly what we want, because for different tasks we can have
> different classpath), but for some reason, it cannot be unloaded. I'm
> seeing that this `ClassInfo` instance, in Java 7, is retained by a
> `ClassValue`, which corresponds to a class of the `primitiveTypes` array
> of `sun.reflect.AccessorGenerator`. This array is a static Class[] array
> containing Integer.TYPE, ...

hmm... true... we do add extension methods to system classes. This could
indeed prevent unloading of the groovy runtime. I am considering all
java.lang classes here. I don't think a primitive type is special in
that regard.

[...]
> Of course, you're going to tell me it can be a JDK bug

It is debatable. I think this has not been specified.

> let's see what
> the Java 6 version says. It still fails more or less at the same time
> with a PermGen space issue, but this time, YourKit is not able to find a
> path to the GC root, which is strongly reachable. So in that case, it
> might just mean that JDK 6 is not able to unload the classes (I don't
> recall if you need an explicit VM flag like
> +XX:+CMSClassUnloadingEnabled). So yes, it might just well be a JDK bug,
> but how to be sure?

Try the flags and see what happens. I would try finding a "minimal"
permgen setting that is required without the flags, then add the fags
and lower the permgen setting. If there is an improvement, you know it
works.

> Tomorrow I'm going to check if I can trace the classes which we load,
> then force clearing the `ClassValue` instances. I have no idea if it is
> doable, entering a domain I've not played with yet.

it's non-public API you will be using

Jochen Theodorou

unread,
Aug 25, 2015, 3:19:23 AM8/25/15
to gradl...@googlegroups.com
the last mail got to Cedric only by accident...

in the meantime I wrote a small test program like this:

import java.net.*;
import java.io.*;

public class CVTest {

public static void main(String[] args) throws Throwable {
for (long i = 0; i<10000000; i++) {
File dir = new File("t/");
URLClassLoader classLoader = new URLClassLoader(new
URL[]{dir.toURI().toURL()});
ClassValue cv = (ClassValue)
classLoader.loadClass("MyClassValue").newInstance();
Object value = cv.get(Integer.TYPE);
assert value !=null;
assert value.getClass().getClassLoader() == classLoader;
classLoader.close();
}

}
}

public class MyClassValue extends ClassValue {
@Override
protected Object computeValue(Class type) {
return new Dummy();
}
}

public class Dummy {}


In this more simple scenario The later two classes are in subdirectory t
of the current directory and loaded at runtime in a loop. We set a
ClassValue from this, assert it is really there and then repeat for a
long time. I did not run the program to the end, but jvisualvm data
suggests that there is a lot of class unloading happening on a regular
basis. So I would assume that ClassValue does not prevent the loader and
its loaded classes from being unloaded.

Of course that is no guarantee for a more complex case to work as well,
but it suggests that this is how it should behave

bye blackdrag

Cedric Champeau

unread,
Aug 25, 2015, 3:47:16 AM8/25/15
to gradl...@googlegroups.com
Thanks Jochen. I'll continue my investigation. For what it's worth, adding '-XX:+CMSClassUnloadingEnabled', '-XX:+UseConcMarkSweepGC' to the Java 6 options did not solve the problem. So there's definitely something which prevents the runtime from being unloaded... but what...

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Cedric Champeau

unread,
Aug 26, 2015, 12:52:32 PM8/26/15
to gradl...@googlegroups.com
So I discarded both the Groovy upgrade and CodeNarc upgrades to be the source of the problem. I think have now a much simpler setup that reproduces the error:

package doNotCommit;

import java.io.File;
import java.net.URL;
import java.net.URLClassLoader;

public class PermGenLeak {

public static final String GROOVY_JAR = "/home/cchampeau/.gvm/groovy/2.4.4/lib/groovy-2.4.4.jar";

public static void main(String[] args) throws Exception {
int i = 0;
try {
while (true) {
i++;
URLClassLoader loader = new URLClassLoader(
new URL[] { new File(GROOVY_JAR).toURI().toURL() },
ClassLoader.getSystemClassLoader().getParent());
Class system = loader.loadClass("groovy.lang.GroovySystem");
system.getDeclaredMethod("getMetaClassRegistry").invoke(null);
system.getDeclaredMethod("stopThreadedReferenceManager").invoke(null);
loader.close();
System.gc();
System.out.println("That's your chance to take a heap dump!");
Thread.sleep(4000);
}
} catch (OutOfMemoryError e) {
System.err.println("Failed after " + i + " loadings");
}
}
}

The bad news is that it is beyond the control of Gradle...

Cedric Champeau

unread,
Aug 29, 2015, 5:50:41 PM8/29/15
to gradle-dev
So here is a summary of my findings: http://melix.github.io/blog/2015/08/permgenleak.html

As you will see, this is far as simple as a leak in CodeNarc....

Luke Daley

unread,
Sep 1, 2015, 1:17:21 AM9/1/15
to gradl...@googlegroups.com, Cedric Champeau
I look forward to sitting down with a coffee and comprehending all of this. After a quick scan, looks like I might be on to hard liquor by the end.

On 30 August 2015 at 7:50:42 am, Cedric Champeau (ced...@gradle.com) wrote:

So here is a summary of my findings: http://melix.github.io/blog/2015/08/permgenleak.html

As you will see, this is far as simple as a leak in CodeNarc....
--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages