Improving Groovy script compilation performance

66 views
Skip to first unread message

Cedric Champeau

unread,
Sep 27, 2015, 12:11:51 PM9/27/15
to gradle-dev
Hi guys,

I was investigating how to improve performance of compilation of Groovy scripts. As you may know, I recently introduced org.gradle.groovy.scripts.internal.DefaultScriptCompilationHandler.ShortcutClassNodeResolver, which greatly improved compilation times by avoiding calls to classloader.findClass for some special class names patterns.

I think that we can improve this even better. One issue today is that we dumbly add a lot of default imports to scripts. The list of default imports is found in a file named default-imports.txt which is generated at build time. This list is big (175 entries), which is equivalent to adding 175 star imports at the beginning at each script.

So each time a class literal is found in a script, the compiler has to look into the star imports and check if a class with that name exists in the package. For example, when it sees "JvmLibrarySpec", it has to test:

org.gradle.JvmLibrarySpec
org.gradle.api.JvmLibrarySpec
org.gradle.api.artifacts.JvmLibrarySpec
...
until it finds the right one: 
org.gradle.jvm.JvmLibrarySpec

which is the 74th entry in the import list!

I can see several ways to improve this:

1. remove some imports from the default list in order to reduce the number of classes to check
2. reorder imports to have the most likely packages first
3. only add imports based on the plugins which are applied
4. add a cache which is shared by all compilations, based on the classpath. It is different from the cache in CachingClassLoader, because it would act at the resolver level, and would be aimed at caching misses.

Item 1. is a quick win, but has several disadvantages: it may be a breaking change (some classes might not be found anymore by default), and it might imply that users would have to add "import xxx.yyy.zzz" to build scripts, which is not very friendly.
Item 2. is the easiest thing to do, but it requires some "statistics" of usage, or a good idea of what is frequently used. We can generate those out of our own build, I have attached an example of what the resolver tries to resolve for the integration tests in language-java/src/integTest/groovy/org/gradle/language/java. The number corresponds to the number of times the resolver tried to resolve this particular class name in the whole test suite. In the end, this solution doesn't change the worst case, which is looking up the whole list of imports.
Item 3 sounds like a good idea but it very hard to implement, because when we compile the script, it is also to know which plugins are applied, so we have a chicken and egg issue here.
Item 4 might be very interesting performance wise, but not that easy to implement. It's not because a class is not found in one script, that it should not be found in another. I have to take a closer look at how we compile scripts to see what we can do.


--
Cédric Champeau
Principal Engineer
Gradle, Inc.

stats-resolve-javalang-inttest.txt

Sterling Greene

unread,
Sep 27, 2015, 12:35:28 PM9/27/15
to gradl...@googlegroups.com
Is it better/worse if we replaced some star imports with specific class imports for the most likely used classes before any star imports?

Do you have any guesses at how much we'd gain if we improved the imports (using any method)?  Is this just a one time hit (during script compilation)?

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gradle-dev/CAPHf-2fzEcuiLmADFC0Uq%3D9UYYcdhi-gA%2BKMHAD-U9o09dU4jQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hans Dockter

unread,
Sep 27, 2015, 12:46:01 PM9/27/15
to gradl...@googlegroups.com

Compilation time is a significant usability issue for larger projects. So any improvement there is welcome.

@Cedric: Any idea what savings that could give us? E.g. how long does a compile take if you disable default imports?


Cedric Champeau

unread,
Sep 27, 2015, 12:47:09 PM9/27/15
to gradle-dev
2015-09-27 18:35 GMT+02:00 Sterling Greene <ster...@gradle.com>:
Is it better/worse if we replaced some star imports with specific class imports for the most likely used classes before any star imports?

Specific classes should be much faster, because they are directly resolved when visitClass is called.
 

Do you have any guesses at how much we'd gain if we improved the imports (using any method)?  Is this just a one time hit (during script compilation)?
Given the impressive performance improvements that we see using simple exclusion strategies in ShortcutClassNodeResolver, and given the fact that resolving is what consumes most of the time, I expect great performance improvements, but it's difficult to say without trying and measuring.
 

For more options, visit https://groups.google.com/d/optout.

Adam Murdoch

unread,
Sep 27, 2015, 4:08:53 PM9/27/15
to gradl...@googlegroups.com
On Mon, Sep 28, 2015 at 2:45 AM, Hans Dockter <ha...@gradle.com> wrote:

Compilation time is a significant usability issue for larger projects. So any improvement there is welcome.


Script compilation in 2.8 is about twice as fast as it was in 2.0, which was about twice as fast as 1.0.

 

For more options, visit https://groups.google.com/d/optout.



--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
CTO Gradle Inc. - Gradle Training, Support, Consulting
http://www.gradle.com

Adam Murdoch

unread,
Sep 27, 2015, 4:29:39 PM9/27/15
to gradl...@googlegroups.com
On Mon, Sep 28, 2015 at 2:11 AM, Cedric Champeau <ced...@gradle.com> wrote:
Hi guys,

I was investigating how to improve performance of compilation of Groovy scripts. As you may know, I recently introduced org.gradle.groovy.scripts.internal.DefaultScriptCompilationHandler.ShortcutClassNodeResolver, which greatly improved compilation times by avoiding calls to classloader.findClass for some special class names patterns.

I think that we can improve this even better. One issue today is that we dumbly add a lot of default imports to scripts. The list of default imports is found in a file named default-imports.txt which is generated at build time. This list is big (175 entries), which is equivalent to adding 175 star imports at the beginning at each script.

So each time a class literal is found in a script, the compiler has to look into the star imports and check if a class with that name exists in the package. For example, when it sees "JvmLibrarySpec", it has to test:

org.gradle.JvmLibrarySpec
org.gradle.api.JvmLibrarySpec
org.gradle.api.artifacts.JvmLibrarySpec
...
until it finds the right one: 
org.gradle.jvm.JvmLibrarySpec

which is the 74th entry in the import list!

I can see several ways to improve this:

1. remove some imports from the default list in order to reduce the number of classes to check
2. reorder imports to have the most likely packages first
3. only add imports based on the plugins which are applied
4. add a cache which is shared by all compilations, based on the classpath. It is different from the cache in CachingClassLoader, because it would act at the resolver level, and would be aimed at caching misses.

Item 1. is a quick win, but has several disadvantages: it may be a breaking change (some classes might not be found anymore by default), and it might imply that users would have to add "import xxx.yyy.zzz" to build scripts, which is not very friendly.

It’s very much a breaking change.
 
Item 2. is the easiest thing to do, but it requires some "statistics" of usage, or a good idea of what is frequently used. We can generate those out of our own build, I have attached an example of what the resolver tries to resolve for the integration tests in language-java/src/integTest/groovy/org/gradle/language/java. The number corresponds to the number of times the resolver tried to resolve this particular class name in the whole test suite. In the end, this solution doesn't change the worst case, which is looking up the whole list of imports.
Item 3 sounds like a good idea but it very hard to implement, because when we compile the script, it is also to know which plugins are applied, so we have a chicken and egg issue here.

It’s also a breaking change.
 
Item 4 might be very interesting performance wise, but not that easy to implement. It's not because a class is not found in one script, that it should not be found in another. I have to take a closer look at how we compile scripts to see what we can do.

The default imports are the same for all the scripts, as is the API that is visible to scripts. So, the solution should take advantage of the fact that given a name `T`, if this name resolves to an API class in one script, then `T` should resolve to that class in every other script. And if `T` does not resolve to an API class in one script, the `T` should not resolve to any class in every other script.

It would be nice if we could insert a resolve strategy into `ResolveVisitor` that is used just before it attempts to resolve a type using the star imports. This strategy would then be able to efficiently resolve `T` to an API class and could be backed by a shared cache. The current hook in `ClassNodeResolver` is much too late to do this efficiently.

 


--
Cédric Champeau
Principal Engineer
Gradle, Inc.

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gradle-dev/CAPHf-2fzEcuiLmADFC0Uq%3D9UYYcdhi-gA%2BKMHAD-U9o09dU4jQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--

Hans Dockter

unread,
Sep 27, 2015, 8:48:49 PM9/27/15
to gradl...@googlegroups.com
Yes. But there seems to be a regression when compiling scripts using the new model. When I played with the bigNewJava project from the performance project that seemed to be the case. I don't have numbers but it was pretty obvious.

Hans

Cedric Champeau

unread,
Sep 28, 2015, 3:20:47 AM9/28/15
to gradle-dev

The default imports are the same for all the scripts, as is the API that is visible to scripts. So, the solution should take advantage of the fact that given a name `T`, if this name resolves to an API class in one script, then `T` should resolve to that class in every other script. And if `T` does not resolve to an API class in one script, the `T` should not resolve to any class in every other script.

Is it really true, in terms of how it is currently implemented? My understanding is that we have 2 steps: 1 for being able to load the "buildscript" classpath, and the 2d one for the whole scripts. This imply at least 2 different classloaders, so a class that doesn't exist in one classloader (a class used in a plugin) might not be visible in another (the buildscript part).
 
It would be nice if we could insert a resolve strategy into `ResolveVisitor` that is used just before it attempts to resolve a type using the star imports. This strategy would then be able to efficiently resolve `T` to an API class and could be backed by a shared cache. The current hook in `ClassNodeResolver` is much too late to do this efficiently.

Yes. The resolution logic is a bit complex, but it should be possible to do something like that. However, I am not sure it would be more efficient than having an explicit list of all imports (not star imports).


 


--
Cédric Champeau
Principal Engineer
Gradle, Inc.

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gradle-dev/CAPHf-2fzEcuiLmADFC0Uq%3D9UYYcdhi-gA%2BKMHAD-U9o09dU4jQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Adam Murdoch
Gradle Co-founder
http://www.gradle.org
CTO Gradle Inc. - Gradle Training, Support, Consulting
http://www.gradle.com

--
You received this message because you are subscribed to the Google Groups "gradle-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gradle-dev+...@googlegroups.com.
To post to this group, send email to gradl...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Adam Murdoch

unread,
Sep 28, 2015, 3:33:37 AM9/28/15
to gradl...@googlegroups.com
On Mon, Sep 28, 2015 at 5:20 PM, Cedric Champeau <ced...@gradle.com> wrote:

The default imports are the same for all the scripts, as is the API that is visible to scripts. So, the solution should take advantage of the fact that given a name `T`, if this name resolves to an API class in one script, then `T` should resolve to that class in every other script. And if `T` does not resolve to an API class in one script, the `T` should not resolve to any class in every other script.

Is it really true, in terms of how it is currently implemented? My understanding is that we have 2 steps: 1 for being able to load the "buildscript" classpath, and the 2d one for the whole scripts. This imply at least 2 different classloaders, so a class that doesn't exist in one classloader (a class used in a plugin) might not be visible in another (the buildscript part).

Yes it’s true. The Gradle API is visible in both cases and is exactly the same for all script compilation, and so we can reuse the result of resolving a name to an API class.

There’s usually other, script specific, classes that are visible to the script at compilation time, but none of them are referenced by the default imports. Even for those, often we will reuse ClassLoaders and in this case can take advantage of some caching of resolutions.
 
 
It would be nice if we could insert a resolve strategy into `ResolveVisitor` that is used just before it attempts to resolve a type using the star imports. This strategy would then be able to efficiently resolve `T` to an API class and could be backed by a shared cache. The current hook in `ClassNodeResolver` is much too late to do this efficiently.

Yes. The resolution logic is a bit complex, but it should be possible to do something like that. However, I am not sure it would be more efficient than having an explicit list of all imports (not star imports).


Possibly, provided `ResolveVisitor` is changed to resolve the imports on demand. It looks like it resolves them all up front and there are thousands of classes in the Gradle API that would need resolving during compilation. Without such a fix, a list of all imports will be much, much slower than if we were to swap in our own (name, star imports) -> (class node) strategy.

Cedric Champeau

unread,
Sep 28, 2015, 3:42:31 AM9/28/15
to gradle-dev
2015-09-28 9:33 GMT+02:00 Adam Murdoch <ad...@gradle.com>:


On Mon, Sep 28, 2015 at 5:20 PM, Cedric Champeau <ced...@gradle.com> wrote:

The default imports are the same for all the scripts, as is the API that is visible to scripts. So, the solution should take advantage of the fact that given a name `T`, if this name resolves to an API class in one script, then `T` should resolve to that class in every other script. And if `T` does not resolve to an API class in one script, the `T` should not resolve to any class in every other script.

Is it really true, in terms of how it is currently implemented? My understanding is that we have 2 steps: 1 for being able to load the "buildscript" classpath, and the 2d one for the whole scripts. This imply at least 2 different classloaders, so a class that doesn't exist in one classloader (a class used in a plugin) might not be visible in another (the buildscript part).

Yes it’s true. The Gradle API is visible in both cases and is exactly the same for all script compilation, and so we can reuse the result of resolving a name to an API class.

There’s usually other, script specific, classes that are visible to the script at compilation time, but none of them are referenced by the default imports. Even for those, often we will reuse ClassLoaders and in this case can take advantage of some caching of resolutions.
 
That's true, but what I mean is that resolution is more complex than just having a list of "know classes" to make it faster. Say for example that the build script defines a class names "Jar". The resolver would prefer that one over one defined into imports. Same for an inner class which name is "Jar". That is to say, that for a given "literal", there's an order of execution of rules (which is admittedly complex), and some of them are recursive. 
 
It would be nice if we could insert a resolve strategy into `ResolveVisitor` that is used just before it attempts to resolve a type using the star imports. This strategy would then be able to efficiently resolve `T` to an API class and could be backed by a shared cache. The current hook in `ClassNodeResolver` is much too late to do this efficiently.

Yes. The resolution logic is a bit complex, but it should be possible to do something like that. However, I am not sure it would be more efficient than having an explicit list of all imports (not star imports).


Possibly, provided `ResolveVisitor` is changed to resolve the imports on demand. It looks like it resolves them all up front and there are thousands of classes in the Gradle API that would need resolving during compilation. Without such a fix, a list of all imports will be much, much slower than if we were to swap in our own (name, star imports) -> (class node) strategy.

Yes, but as I said given the precedence rules that exist (look in the module first, then inner classes, then imports, then static imports, then star imports, ...) it is not a trivial thing. That's definitely worth investigating though. I also noticed that the resolve visitor in Groovy caches misses, but it caches a lot of things, like the dumb java$lang$ class misses. The Groovy compiler itself would benefit from a smarter resolution strategy. 

Adam Murdoch

unread,
Sep 30, 2015, 2:51:58 PM9/30/15
to gradl...@googlegroups.com
On Mon, Sep 28, 2015 at 5:42 PM, Cedric Champeau <ced...@gradle.com> wrote:


2015-09-28 9:33 GMT+02:00 Adam Murdoch <ad...@gradle.com>:


On Mon, Sep 28, 2015 at 5:20 PM, Cedric Champeau <ced...@gradle.com> wrote:

The default imports are the same for all the scripts, as is the API that is visible to scripts. So, the solution should take advantage of the fact that given a name `T`, if this name resolves to an API class in one script, then `T` should resolve to that class in every other script. And if `T` does not resolve to an API class in one script, the `T` should not resolve to any class in every other script.

Is it really true, in terms of how it is currently implemented? My understanding is that we have 2 steps: 1 for being able to load the "buildscript" classpath, and the 2d one for the whole scripts. This imply at least 2 different classloaders, so a class that doesn't exist in one classloader (a class used in a plugin) might not be visible in another (the buildscript part).

Yes it’s true. The Gradle API is visible in both cases and is exactly the same for all script compilation, and so we can reuse the result of resolving a name to an API class.

There’s usually other, script specific, classes that are visible to the script at compilation time, but none of them are referenced by the default imports. Even for those, often we will reuse ClassLoaders and in this case can take advantage of some caching of resolutions.
 
That's true, but what I mean is that resolution is more complex than just having a list of "know classes" to make it faster. Say for example that the build script defines a class names "Jar". The resolver would prefer that one over one defined into imports. Same for an inner class which name is "Jar". That is to say, that for a given "literal", there's an order of execution of rules (which is admittedly complex), and some of them are recursive. 

We wouldn’t need to change any of that. The ordering can stay the same. We just want to take the part of the resolution process that (effectively) takes a type name and a set of star imports and resolves it to a class node, and replace it with our own look up. That’s it. This isn’t complex, as all the hard work has already been done in ResolveVisitor and friends.

 
 
It would be nice if we could insert a resolve strategy into `ResolveVisitor` that is used just before it attempts to resolve a type using the star imports. This strategy would then be able to efficiently resolve `T` to an API class and could be backed by a shared cache. The current hook in `ClassNodeResolver` is much too late to do this efficiently.

Yes. The resolution logic is a bit complex, but it should be possible to do something like that. However, I am not sure it would be more efficient than having an explicit list of all imports (not star imports).


Possibly, provided `ResolveVisitor` is changed to resolve the imports on demand. It looks like it resolves them all up front and there are thousands of classes in the Gradle API that would need resolving during compilation. Without such a fix, a list of all imports will be much, much slower than if we were to swap in our own (name, star imports) -> (class node) strategy.

Yes, but as I said given the precedence rules that exist (look in the module first, then inner classes, then imports, then static imports, then star imports, ...) it is not a trivial thing.

Again, we don’t need to change this at all. The hard stuff has already been done. We just want to snip out a very specific step and replace it with our own.

Rather than resolve the fully qualified imports on demand, another approach would be to allow the (full qualified imports) -> (class nodes) strategy to be replaced with our own and we can simply mix in all the API classes into this result and cache all the API class nodes for the next compilation.


 
That's definitely worth investigating though. I also noticed that the resolve visitor in Groovy caches misses, but it caches a lot of things, like the dumb java$lang$ class misses. The Groovy compiler itself would benefit from a smarter resolution strategy. 

I don’t really care too much about that. I just want resolution of our API types to be faster (and it can pretty easily be from my perspective).

Cedric Champeau

unread,
Oct 1, 2015, 5:30:52 AM10/1/15
to gradle-dev
 
That's definitely worth investigating though. I also noticed that the resolve visitor in Groovy caches misses, but it caches a lot of things, like the dumb java$lang$ class misses. The Groovy compiler itself would benefit from a smarter resolution strategy. 

I don’t really care too much about that. I just want resolution of our API types to be faster (and it can pretty easily be from my perspective).

Well, you should care ;) What I mean is that basically the resolver is somehow broken. It does way too much, in a weird order. So fixing this would improve the performance of Gradle. For example, for a simple import, the resolver will try to resolve it against the default groovy imports first, which is totally dumb. So we (my Groovy committer hat on) have to fix it, and we also should take advantage of that to provide more hooks for performance optimizations like the one you're asking for.

Reply all
Reply to author
Forward
0 new messages