See here: http://zeroturnaround.com/blog/rjc201/
So a solution would be to use a size-limited LRU cache with weak references and a class-loader per generated class inside of the GroovyScriptEngine.
Probably put up an issue on the github repo but I'm not sure how likely it is that those are fixed, also there is no way of "removing" a script from that cache (regardless of the gc-classloader leak issues).
Cheers
Michael
Am 12.12.2011 um 12:36 schrieb James Thornton:
> Hey Michael -
>
> Thanks for the info. I looked at the GSE source to try and understand
> the problem better. Please let me know if these are the problem areas
> you are referring to...
>
>
>> But it doesn't really help us as new scripts always result in new classes,
>> even when just a single character is changed.
>
> Is this what you mean when you say, "new scripts always result in new classes"?
>
> https://github.com/russel/groovy/blob/master/src/main/org/codehaus/groovy/jsr223/GroovyScriptEngineImpl.java#L109
>
>
>> And he didn't look into the GSE implementation - where this scriptCache resides :(
>
> And is this the scriptCache (classMap) that's causing the problem by
> growing unbounded?:
>
> https://github.com/russel/groovy/blob/master/src/main/org/codehaus/groovy/jsr223/GroovyScriptEngineImpl.java#L338
>
> Thanks.
>
> - James
I think instead of having James as communication bridge in between you
and me (on the Groovy side) and potentially others, I thought I come
here to see if we cannot find a suitable solution - also because I
think James cannot answer all questions I have.
First the disclaimer, that I know a bit about neo4j, but never looked
into it in more detail and never worked with it. So for example I
don't know if you actually have a special reason to use the JSR223
engine here. If you could describe your requirements for the script
engine a little, I could think about what is the best solution.
How about it Michael?
bye blackdrag
On 12 Dez., 23:54, Michael Hunger <michael.hun...@neotechnology.com>
wrote:
sure, would love that.
Here is the scenario.
The tinkerpop stack offers a lower level graph traversal implementation called Pipes and higher level DSL's in different languages
Gremlin in Groovy was first.
We're including a server extension in neo4j-server that accepts gremlin strings as post data and executes them with the GroovyScriptEngine
(actually a GremlinScriptEngine but that is just a simple subclass).
The problem arises if users generate a lot of different scripts (> 500 is enough for medium sized heaps - 1G) which are all stored and hold onto by the script
engine. Our workaround so far are
* killing the script-engine every 500 requests. Disadvantage is that you loose all precompiled classes that are executed more often than once
* encouraging people to use groovy/gremlin variables which will reduce the # of different scripts instead of inlining them into the script code
I see 3 Problems there:
* cache is just a hashmap, not a weak-hashmap so it holds on to the scripts and script-classes
* cache is not a bound LRU cache so that older entries are not evicted
* there is a probably single classloader used for the classes so due to the classloader<->class bidirectional reference as long as the classloader is still used it can't be collected and also its classes
In the future we'd like to provide dynamic language server extension that are probably JSR223 based (but probably that comes with too many disadvantages).
Hope that explains our use-case.
Cheers
Michael
If we - in Groovy - talk about GroovyScriptEngine we mean the one in
groovy.util, you mean the jsr 223 engine. And I guess you use that one
because you wan to support multiple languages through the jsr 223
interface. The problem with that interface is, that it does bind your
hands quite a lot in terms of configuration.
> The problem arises if users generate a lot of different scripts (>
> 500 is enough for medium sized heaps - 1G) which are all stored and
> hold onto by the script engine. Our workaround so far are * killing
> the script-engine every 500 requests. Disadvantage is that you loose
> all precompiled classes that are executed more often than once *
> encouraging people to use groovy/gremlin variables which will reduce
> the # of different scripts instead of inlining them into the script
> code
so on one hand you want to hold on to scripts, on the other you don't
want too many.
> I see 3 Problems there:
>
> * cache is just a hashmap, not a weak-hashmap so it holds on to the
> scripts and script-classes
> * cache is not a bound LRU cache so that older entries are not evicted
> * there is a probably single classloader used for the classes so due
> to the classloader<->class bidirectional reference as long as the
> classloader is still used it can't be collected and also its classes
WeakHashMap is not the right structure, in a WeakHashMap the keys are
weak, the classes are stored as values. But we have a map for that,
since we use something like that to store meta classes. As for the
loader, GroovyClassLoader uses a new sub loader for each compilation, so
that part is no problem. But that doesn't help as long as something
holds onto the classes of course.
> In the future we'd like to provide dynamic language server extension
> that are probably JSR223 based (but probably that comes with too many
> disadvantages).
>
> Hope that explains our use-case.
I see. So you don't insist on jsr223. That is good ;) If I give you a
program using GroovyClassLoader or something like that, combined with a
LRU cache, would that be then ok for you?
bye blackdrag
--
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org
Thanks a lot,
Sounds good, the current code is here:
https://github.com/neo4j/cypher-plugin
the gremlin-script-engine comes from the tinkerpop-stack:
https://github.com/tinkerpop/gremlin
Cheers
Michael
Sounds good, the current code is here: https://github.com/neo4j/cypher-plugin