Dynamic Languages and Multithreading on the JVM

15 views
Skip to first unread message

John Wilson

unread,
Jun 4, 2007, 5:31:23 AM6/4/07
to jvm-la...@googlegroups.com
I'm interested in what problems, if any, dynamic language implementers
have with threading on the JVM. The current Ng runtime implementation
takes a great deal of trouble to try to avoid blocking threads. I'm
wondering if I'm missing any tricks or if I'm worrying too much about
this problem.

Ng takes Groovy's approach to coexisting with Java on the JVM. Ng,
like Groovy, is an adjunct to Java not a replacement for it. Ng
supports optional typing. Ng classes are Java classes, can subclass
Java classes and interfaces, and implement Java interfaces. Java
classes can subclass Ng classes and interfaces and implement Ng
interfaces. This means that most of the "dynamic magic" has to exist
outside of the Ng class. (In fact, Ng has less magic in the base class
than Groovy - just a read only property called metaClass. There are no
invokeMethod, setMetaClass, get/setProperty methods).

There seem to be two ways of adding dynamic magic to what are essentially POJOs.

Firstly you can use wrapper classes to box the basic class and to add
extra behaviour. The objects are unboxed when passed to Java methods,
written to arrays, etc. and boxed when they are returned from Java
functions, retrieved from arrays, etc.

Secondly you can use a MetaClass. There is generally a one to one
mapping between a class an an instance of the MetaClass (in Groovy
this mapping can change as the program is executed, in Ng this mapping
is immutable.) Operations are not performed direct;y on the object but
rather the MetaClass is asked to perform the operation on the object.
his results in the following sequence of operations:

Get the class of the object
Get the corresponding MetaClass instance for this class
Call a method on the MetaClass instance passing the object and any
other information needed to perform the operation.

There are, of course, strategies which allow the compiler to get the
MetaClass instance quicker - I'll go into some more detail later.

These two approaches are fundamentally very similar. There is a basic
requirement to have a way of mapping a class to something which
provides extra functionality to the Java class. It is this mapping
process which can be expensive and can provide a bottleneck in highly
threaded environments.

From here on in I will only talk about the use of MetaClasses. I
evaluated and rejected the wrapper approach early on in the design of
Ng so I'm not in a position to provide any views based on experience
on wrappers.

The problem of mapping classes to MetaClasses is, conceptually,
trivial. You just need some sort of HashMap with classes as keys and
MetaClass instances as values. However the simple approach introduces
two big problems:

1/ Putting classes into a simple HashMap means that there's a hard
reference to each class used by the program. This prevents any class
used by the dynamic language from being garbage collected. Using
WeakHashMap doesn't help because the value of the map entry (the
MetaClass instance) holds references to the key value (the class)
which in turn stops the class being collected. Experience with Groovy
shows that this can be a nasty problem. The natural way of
implementing Closures is by creating a class per Closure. This means
that Groovy (and Ng) programs have many small classes which are used
infrequently. Also it's very common for programmers to dynamically
compile "one-shot" scripts which are compiled, executed and discarded.
The consequence of this is that garbage collection of classes in long
running programs (at least in long running Groovy and Ng programs) is
vital if the system is to be stable (i.e. to avoid out of memory
errors every few hours or days).

2/ The Map has to be mutable and has to support the addition of a
class -> MetaClass mapping in one thread whilst other threads are
looking up other mappings. In particular if two threads attempt to
create a new mapping for new class at the same time a single mapping
must be inserted on the Map and both threads must get the same
instance of the MetaClass back.

The second problem is the source of my concern about threading
performance: it's trivially easy to get the semantics we need with a
HashMap - we just synchronise the accessor and mutator methods and it
all works fine. The problem is that this solution causes the
class->MetaClass mapping process to be a choke point when the number
of active threads rises. It would be really nice to fins a map
implementation which allowed concurrent update and access without
locking, where updates appeared immediately to all threads and which
allowed the garbage collection of the keys even if the values
contained references to the key.
java.util.concurrent.ConcurrentHashMap seems to offer some of this
(and it does the tricky stuff). I have yet to investigate this in
detail. (I want to have some more measurements of my existing
implementation before dropping it in and measuring the difference).

At the moment I have a single canonical HashMap which holds
class->MetaClass mappings. Access is fully synchronised so it's slow,
simple and safe. I have two classes of strategy to avoid this data
structure killing performance in highly multi threaded environments:

1/ Avoid looking things up in the Map

2/ Avoid putting mappings into the Map

i.e. avoid using the map if at all possible :)

The mapping between class and MetaClass instance is immutable. If the
behaviour of the class changes then the behaviour of the MetaClass
instance is changed. This means that the compile is free to hoist
MetaClass instances out of loops, for example. It also means that the
mapping can be freely and safely cached (see below).

All NgObjects (i.e. the bulk of objects declared in Ng programs) have
a getMetaClass() method. This is always used to get the associated
MetaClass so the Map is never accessed for these objects. Note however
that we still need to get the mapping in the constructor. With things
like int objects (the Ng object which wraps int values) it is very
common for the object to be constructed have a single operation
performed on it and it be immediately discarded. In this, quite
common, cases the cost of looking up the mapping in the constructor is
a significant overhead. I will cover how I avoid the look-up in the
constructor later. I have some techniques for avoiding the need to
always wrap the primitive but these are outside the scope of this
discussion.

I have a per thread unsynchronised HashMap which is used to look up
mappings. This is used as the first port of call when we want the
MetaClass for a non NgObject. Getting the Map from ThreadLocal storage
is expensive but the compiler can cache the thread context data
structure and incur that cost only once in a method.

If the per thread HashMap does not contain the key then the class is
introspected looking for a public static final field called
ngMetaClass. If this exists and is the correct type than this value is
inserted into the thread local HashMap and is returned to the caller.
Otherwise the Canonical HashMap is asked to provide the MetaClass and
that mapping is put into the thread local HashMap before the result is
returned to the caller.

NgObject has a constructor which takes a MetaClass instance so that
the metaClass property can be set without accessing the registry the
no args constructor uses the registry.

The sum total of this is that Ng goes to a great deal of trouble to
use its central MetaClass registry (once per thread for POJOs,
probably never for Ng Objects). In doing so it incurs a performance
hit in having to use thread local storage (though that can, in
principal, be ameliorated by a cleaver compiler).

My questions to other implementers are:

1/ Do you see multithreaded performance as being an issue?

2/ If so what do you do to solve the problem (assuming you have the
problem in the first place!)

John Wilson

Charles Oliver Nutter

unread,
Jun 4, 2007, 2:08:54 PM6/4/07
to jvm-la...@googlegroups.com
John Wilson wrote:
> There seem to be two ways of adding dynamic magic to what are essentially POJOs.

JRuby uses both of these techniques...boxing for non-Ruby Java types and
"just" a metaclass for Ruby types. Of course the boxed Java objects
still just use the same metaclass mechanism, since they then appear like
normal Ruby objects. But there's overhead there, about 2X simply calling
standard Ruby objects. That's going to be a post 1.0 area of focus.

We also have some fast-path dispatching built into the core JRuby types
to speed primitive operations (Strings, Arrays, Hashes) that I mentioned
before. This is another large reason why dispatching to those core types
is far faster than dispatching to Java objects.

> The problem of mapping classes to MetaClasses is, conceptually,
> trivial. You just need some sort of HashMap with classes as keys and
> MetaClass instances as values. However the simple approach introduces
> two big problems:
>
> 1/ Putting classes into a simple HashMap means that there's a hard
> reference to each class used by the program. This prevents any class


Two possibilities here:
- You could build your own key. I know RubyCLR (a C Ruby to CLR bridge)
used a string as the key. This is simplistic, but it wouldn't be
difficult for you to use an Integer representing the hash value of the
class as your key instead, or a custom data type that's more selective.
- You can build a double-weak map too, provided the metaclass has a
pointer to the class. Then only if the key (the metaclass) gets
collected will the class be eligible for collection and you avoid the
leak potential.

> used by the dynamic language from being garbage collected. Using
> WeakHashMap doesn't help because the value of the map entry (the
> MetaClass instance) holds references to the key value (the class)
> which in turn stops the class being collected. Experience with Groovy
> shows that this can be a nasty problem. The natural way of
> implementing Closures is by creating a class per Closure. This means

That's one way, certainly. In JRuby we do not implement them as a
class-per-closure, since we pass in appropriate state. In fact, in
JRuby's AOT mode, there's a single class created per .rb file. The only
places we generate a class-per-function is in JIT mode (for obvious
reasons) and when generating "fast" bindings for Java-based methods
("Callable" objects that invoke directly, rather than using reflection).
So using a class-per-closure is only one way to do this.

In fact, I avoided a class-per-closure for a very specific reason:
constructing yet another object per closure. As it stands now, in JRuby
the only objects constructed per-call are the objects directly required
for that call like variable scope and call frame data structures.

To address the GC issue, we generate a classloader-per-class for all
classes generated at runtime. It's gross, but it's currently the only
way to ensure the classes can GC, as you probably know.

> that Groovy (and Ng) programs have many small classes which are used
> infrequently. Also it's very common for programmers to dynamically
> compile "one-shot" scripts which are compiled, executed and discarded.
> The consequence of this is that garbage collection of classes in long
> running programs (at least in long running Groovy and Ng programs) is
> vital if the system is to be stable (i.e. to avoid out of memory
> errors every few hours or days).
>
> 2/ The Map has to be mutable and has to support the addition of a
> class -> MetaClass mapping in one thread whilst other threads are
> looking up other mappings. In particular if two threads attempt to

...


> java.util.concurrent.ConcurrentHashMap seems to offer some of this
> (and it does the tricky stuff). I have yet to investigate this in
> detail. (I want to have some more measurements of my existing
> implementation before dropping it in and measuring the difference).

I would strongly urge you to look at ConcurrentHashMap and friends.
We're using it extensively in JRuby for common data structures (like the
metaclass's map of method objects) and have been very happy with it.

> At the moment I have a single canonical HashMap which holds
> class->MetaClass mappings. Access is fully synchronised so it's slow,
> simple and safe. I have two classes of strategy to avoid this data
> structure killing performance in highly multi threaded environments:

...


> The mapping between class and MetaClass instance is immutable. If the
> behaviour of the class changes then the behaviour of the MetaClass
> instance is changed. This means that the compile is free to hoist
> MetaClass instances out of loops, for example. It also means that the
> mapping can be freely and safely cached (see below).

The strategy we've taken with JRuby, which is very dynamic all the time
(i.e. all classes are always open...existing objects can even gain new
methods or have new anonymous metaclasses inserted in their hierarchy)
is to just promise that the internal data structures governing JRuby
will remain structurally sound. There's a whole other raft of issues
with being able to modify classes across threads, and I honestly believe
it's not the responsibility of the language implementer to address this
issue. As with any shared data structure, we expect that the consumers
of the language will do their own synchronization to ensure they don't
step on their own threads. And perhaps that's the dividing line...if the
language consumer could reasonably be expected to do their own
synchronization...we don't do it for them.

> All NgObjects (i.e. the bulk of objects declared in Ng programs) have
> a getMetaClass() method. This is always used to get the associated
> MetaClass so the Map is never accessed for these objects. Note however
> that we still need to get the mapping in the constructor. With things
> like int objects (the Ng object which wraps int values) it is very
> common for the object to be constructed have a single operation
> performed on it and it be immediately discarded. In this, quite
> common, cases the cost of looking up the mapping in the constructor is
> a significant overhead. I will cover how I avoid the look-up in the
> constructor later. I have some techniques for avoiding the need to
> always wrap the primitive but these are outside the scope of this
> discussion.

We have a centralized per-runtime class that gets passed to almost all
calls, avoiding the cost of looking up metaclasses in a global hash or
going to a threadlocal. Last year at this time, our single largest
bottleneck was threadlocal lookups. Now it doesn't even register.

> I have a per thread unsynchronised HashMap which is used to look up
> mappings. This is used as the first port of call when we want the
> MetaClass for a non NgObject. Getting the Map from ThreadLocal storage
> is expensive but the compiler can cache the thread context data
> structure and incur that cost only once in a method.

I'd be careful assuming what the compiler can or should do here. You
have one benefit, in that Ng metaclasses can't be replaced. In JRuby we
don't have that benefit, so no metaclass caching can be done in most
cases. But there's perhaps a few larger issues here:

- A map for every class to its metaclass for every thread that calls
into the system seems like it could consume a lot of memory quickly.
- You have that many more threads referencing those classes and
metaclasses, which makes your GC and leak worries a lot more perplexing.
- You have to make sure that all hits to any thread's metaclass map are
synchronized, since another thread may be in the process of constructing
the metaclass already. So you may be able to cache classes in each
thread, but if there are many new threads entering the system they'll
pay a severe cost accessing all those metaclasses for the first time.
- Where will the compiler cache the metaclasses it finds? More
references means more memory use, more places preventing metaclasses
from being GCed, and so on.

I'd like to hear more in this area.

> If the per thread HashMap does not contain the key then the class is
> introspected looking for a public static final field called
> ngMetaClass. If this exists and is the correct type than this value is
> inserted into the thread local HashMap and is returned to the caller.
> Otherwise the Canonical HashMap is asked to provide the MetaClass and
> that mapping is put into the thread local HashMap before the result is
> returned to the caller.
>
> NgObject has a constructor which takes a MetaClass instance so that
> the metaClass property can be set without accessing the registry the
> no args constructor uses the registry.
>
> The sum total of this is that Ng goes to a great deal of trouble to
> use its central MetaClass registry (once per thread for POJOs,
> probably never for Ng Objects). In doing so it incurs a performance
> hit in having to use thread local storage (though that can, in
> principal, be ameliorated by a cleaver compiler).
>
> My questions to other implementers are:
>
> 1/ Do you see multithreaded performance as being an issue?

Our case is somewhat unique, since Ruby normally supports only green
threads. So moving from Ruby to JRuby is a double-edged sword in the
threading department. On the one hand, you gain true concurrency, which
has been sorely lacking in the C implementation. On the other hand, many
Ruby libraries have been designed around spinning up many lightweight
threads, which now have a substantial additional cost under JRuby (for
example, Ruby's network library spins up a thread for every packet
read...to check whether the read times out; if it sounds gross, you're
right, it is.) We've built in the ability to pool threads (making thread
creation almost as cheap as green threads), but it's not 100% complete
and remains unsupported (officially) for JRuby 1.0.

As far as general threading performance...we've eliminated the primary
bottleneck we ever saw, which was getting at the threadlocal that
represents a Ruby thread within the runtime by passing in most cases as
a parameter. So whenever we can pass threadlocal data as a parameter, we
do so. That's eliminated thread-related performance problems to an
insignificant minority of our concerns. Our primary performance problem
at present is all the per-call overhead we deal with (way too much
object construction, argument arrays for all arities, variable scoping
data structures, etc).

> 2/ If so what do you do to solve the problem (assuming you have the
> problem in the first place!)

Bottom line: ensure data structures are basically safe and then pass as
much as possible along the call stack. You get "free" threadlocals when
you pass arguments, after all, and threadlocals used to be our number
one bottleneck.

- Charlie

John Wilson

unread,
Jun 4, 2007, 3:14:27 PM6/4/07
to jvm-la...@googlegroups.com
Charles, many thanks for the very detailed reply.

I'm going to respond in chunks to avoid producing unmanageably large emails.


On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>
> John Wilson wrote:
> > There seem to be two ways of adding dynamic magic to what are essentially POJOs.
>
> JRuby uses both of these techniques...boxing for non-Ruby Java types and
> "just" a metaclass for Ruby types. Of course the boxed Java objects
> still just use the same metaclass mechanism, since they then appear like
> normal Ruby objects. But there's overhead there, about 2X simply calling
> standard Ruby objects. That's going to be a post 1.0 area of focus.

I box primitive arithmetic and logical types but I have methods on the
MetaClass which allow me to handle them unboxed in some circumstances.

I box objects (both POJOs and NG Objects) when they are assigned to a
typed variable or cast to a type when passed as a parameter. There are
some circumstances (when the type is final) when I can avoid that but,
in general, using typed variables will slow the program down
(arithmetic and logical primitives are an exception to this rule).

Method dispatch on boxed objects does not incur such a high overhead
but that's probably because my method dispatch is pretty slow at the
moment :)

>
> We also have some fast-path dispatching built into the core JRuby types
> to speed primitive operations (Strings, Arrays, Hashes) that I mentioned
> before. This is another large reason why dispatching to those core types
> is far faster than dispatching to Java objects.

I plan to do that sort of thing via custom MetaClasses but as the Ng
runtime supports open classes I have to deal with the fact that
arbitrary methods can be added and removed at arbitrary times. You
must have that problem too. Doesn't it impact you fast track dispatch?

>
> > The problem of mapping classes to MetaClasses is, conceptually,
> > trivial. You just need some sort of HashMap with classes as keys and
> > MetaClass instances as values. However the simple approach introduces
> > two big problems:
> >
> > 1/ Putting classes into a simple HashMap means that there's a hard
> > reference to each class used by the program. This prevents any class
>
>
> Two possibilities here:
> - You could build your own key. I know RubyCLR (a C Ruby to CLR bridge)
> used a string as the key. This is simplistic, but it wouldn't be
> difficult for you to use an Integer representing the hash value of the
> class as your key instead, or a custom data type that's more selective.
> - You can build a double-weak map too, provided the metaclass has a
> pointer to the class. Then only if the key (the metaclass) gets
> collected will the class be eligible for collection and you avoid the
> leak potential.

I think the problem is not the key but the value. Using a String would
not allow the class to be GCd, would it? The MetaClass would still be
strongly reachable therefor the Class would never be collected. Am I
misunderstanding your suggestion here?

I use a WeakHashMap with the value wrapped in a WeakReference for my
ThreadLocal MetaClass Registry. In my very limited load tests this
seems to behave quite well and sheds entries as exp acted. At some
later time I'd like to experiment with more aggressive techniques
(clear the Map completely if it gets bigger than some limit, for
example).


John Wilson

Samuele Pedroni

unread,
Jun 4, 2007, 3:33:45 PM6/4/07
to jvm-la...@googlegroups.com
Charles Oliver Nutter wrote:
>
> - You can build a double-weak map too, provided the metaclass has a
> pointer to the class. Then only if the key (the metaclass) gets
> collected will the class be eligible for collection and you avoid the
> leak potential.
>
in Jython we have similar problems and we use such a double-weak
approach to map java classes to corresponding synthetic python type, the
problem is that what you really want is to avoid the expense of creating
the python type which is costly, so the desired effect would be to have
the type around as long as the class. But there is nothing in the usage
of the type that guarantees it will be kept alive reasonably long enough
and not incur a lot of recreations.
What one wants is really equivalent to having the class (or its
classloader, lifetime-wise these are equivalent) point to the type. In
the absence of a security manager there are possibly workarounds, but an
efficient solution for a
such weak mapping from classes would be most welcome.

regards.

Charles Oliver Nutter

unread,
Jun 4, 2007, 3:41:11 PM6/4/07
to jvm-la...@googlegroups.com
John Wilson wrote:
> Charles, many thanks for the very detailed reply.
>
> I'm going to respond in chunks to avoid producing unmanageably large emails.
>
>
> On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>> John Wilson wrote:
>>> There seem to be two ways of adding dynamic magic to what are essentially POJOs.
>> JRuby uses both of these techniques...boxing for non-Ruby Java types and
>> "just" a metaclass for Ruby types. Of course the boxed Java objects
>> still just use the same metaclass mechanism, since they then appear like
>> normal Ruby objects. But there's overhead there, about 2X simply calling
>> standard Ruby objects. That's going to be a post 1.0 area of focus.
>
> I box primitive arithmetic and logical types but I have methods on the
> MetaClass which allow me to handle them unboxed in some circumstances.

Using parameter-doubling I assume? We're planning this change for JRuby
in the future, to allow the use of "lightweight" objects that don't have
wrappers. Same basic principal, but in our case it's more to eliminate
memory overhead (and to some extend call abstraction) than to speed
anything up.

> I box objects (both POJOs and NG Objects) when they are assigned to a
> typed variable or cast to a type when passed as a parameter. There are
> some circumstances (when the type is final) when I can avoid that but,
> in general, using typed variables will slow the program down
> (arithmetic and logical primitives are an exception to this rule).

I've noticed this is the case in Groovy too. My first fib benchmarks
were really slow until I removed all types. So even if Groovy does
support some kind of optional static typing, you pay a major penalty for
using it.

I'd be interested in knowing if there are any papers or techniques for
optimizing optional-static-typed languages out there. I don't know of
any, nor have I done much of a search.

>> We also have some fast-path dispatching built into the core JRuby types
>> to speed primitive operations (Strings, Arrays, Hashes) that I mentioned
>> before. This is another large reason why dispatching to those core types
>> is far faster than dispatching to Java objects.
>
> I plan to do that sort of thing via custom MetaClasses but as the Ng
> runtime supports open classes I have to deal with the fact that
> arbitrary methods can be added and removed at arbitrary times. You
> must have that problem too. Doesn't it impact you fast track dispatch?

If a method is overridden the fast dispatch table entry for that method
gets a "zero". Then future calls will go through typical slow-path
dispatch. In general, however, the hardest-hit methods on the core
classes are almost never overridden.

>> Two possibilities here:
>> - You could build your own key. I know RubyCLR (a C Ruby to CLR bridge)
>> used a string as the key. This is simplistic, but it wouldn't be
>> difficult for you to use an Integer representing the hash value of the
>> class as your key instead, or a custom data type that's more selective.
>> - You can build a double-weak map too, provided the metaclass has a
>> pointer to the class. Then only if the key (the metaclass) gets
>> collected will the class be eligible for collection and you avoid the
>> leak potential.
>
> I think the problem is not the key but the value. Using a String would
> not allow the class to be GCd, would it? The MetaClass would still be
> strongly reachable therefor the Class would never be collected. Am I
> misunderstanding your suggestion here?
>
> I use a WeakHashMap with the value wrapped in a WeakReference for my
> ThreadLocal MetaClass Registry. In my very limited load tests this
> seems to behave quite well and sheds entries as exp acted. At some
> later time I'd like to experiment with more aggressive techniques
> (clear the Map completely if it gets bigger than some limit, for
> example).

I wasn't very clear...your use of weak refs on both sides is what I
meant, and it sounds like you're already doing that.

- Charlie

John Wilson

unread,
Jun 4, 2007, 4:21:53 PM6/4/07
to jvm-la...@googlegroups.com
On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
> > used by the dynamic language from being garbage collected. Using
> > WeakHashMap doesn't help because the value of the map entry (the
> > MetaClass instance) holds references to the key value (the class)
> > which in turn stops the class being collected. Experience with Groovy
> > shows that this can be a nasty problem. The natural way of
> > implementing Closures is by creating a class per Closure. This means
>
> That's one way, certainly. In JRuby we do not implement them as a
> class-per-closure, since we pass in appropriate state. In fact, in
> JRuby's AOT mode, there's a single class created per .rb file. The only
> places we generate a class-per-function is in JIT mode (for obvious
> reasons) and when generating "fast" bindings for Java-based methods
> ("Callable" objects that invoke directly, rather than using reflection).
> So using a class-per-closure is only one way to do this.
>
> In fact, I avoided a class-per-closure for a very specific reason:
> constructing yet another object per closure. As it stands now, in JRuby
> the only objects constructed per-call are the objects directly required
> for that call like variable scope and call frame data structures.

This is very interesting.

So if your class has n Closures do you generate n methods on the
class? When you pass a Closure as a a parameter to a Java method, for
example, do you have some kind of adapter object which forwards the
call?

I'm not completely sure that this would work on Ng (or Groovy, for
that matter). As Closures are objects they can have their behaviour
modified so they need to have a MetaClass associated with them


>
> To address the GC issue, we generate a classloader-per-class for all
> classes generated at runtime. It's gross, but it's currently the only
> way to ensure the classes can GC, as you probably know.

Yes but I can't really see any other way of dealing with this.


>
> > that Groovy (and Ng) programs have many small classes which are used
> > infrequently. Also it's very common for programmers to dynamically
> > compile "one-shot" scripts which are compiled, executed and discarded.
> > The consequence of this is that garbage collection of classes in long
> > running programs (at least in long running Groovy and Ng programs) is
> > vital if the system is to be stable (i.e. to avoid out of memory
> > errors every few hours or days).
> >
> > 2/ The Map has to be mutable and has to support the addition of a
> > class -> MetaClass mapping in one thread whilst other threads are
> > looking up other mappings. In particular if two threads attempt to
> ...
> > java.util.concurrent.ConcurrentHashMap seems to offer some of this
> > (and it does the tricky stuff). I have yet to investigate this in
> > detail. (I want to have some more measurements of my existing
> > implementation before dropping it in and measuring the difference).
>
> I would strongly urge you to look at ConcurrentHashMap and friends.
> We're using it extensively in JRuby for common data structures (like the
> metaclass's map of method objects) and have been very happy with it.

Is there a ConcurrentHashMap of WeakHashMap by any chance?

>
> > At the moment I have a single canonical HashMap which holds
> > class->MetaClass mappings. Access is fully synchronised so it's slow,
> > simple and safe. I have two classes of strategy to avoid this data
> > structure killing performance in highly multi threaded environments:
> ...
> > The mapping between class and MetaClass instance is immutable. If the
> > behaviour of the class changes then the behaviour of the MetaClass
> > instance is changed. This means that the compile is free to hoist
> > MetaClass instances out of loops, for example. It also means that the
> > mapping can be freely and safely cached (see below).
>
> The strategy we've taken with JRuby, which is very dynamic all the time
> (i.e. all classes are always open...existing objects can even gain new
> methods or have new anonymous metaclasses inserted in their hierarchy)
> is to just promise that the internal data structures governing JRuby
> will remain structurally sound. There's a whole other raft of issues
> with being able to modify classes across threads, and I honestly believe
> it's not the responsibility of the language implementer to address this
> issue. As with any shared data structure, we expect that the consumers
> of the language will do their own synchronization to ensure they don't
> step on their own threads. And perhaps that's the dividing line...if the
> language consumer could reasonably be expected to do their own
> synchronization...we don't do it for them.

I certainly think that's a wise approach. However I think it's the
runtime's responsibility to ensure that a method added to a class by
one thread is immediately seen by all other threads, for example. Ng
and Groovy both support this kind of open class (with a different
language mechanism, of course). Groovy also has Categories which add
functions to classes but only for the current thread and only for the
duration of the execution of a closure. This is really very useful but
quite hard to implement efficiently. I plan to do something like this
in Ng. If i can manage it I'd also like any child threads spawned in
the Category closure to inherit the methods.

>
> > All NgObjects (i.e. the bulk of objects declared in Ng programs) have
> > a getMetaClass() method. This is always used to get the associated
> > MetaClass so the Map is never accessed for these objects. Note however
> > that we still need to get the mapping in the constructor. With things
> > like int objects (the Ng object which wraps int values) it is very
> > common for the object to be constructed have a single operation
> > performed on it and it be immediately discarded. In this, quite
> > common, cases the cost of looking up the mapping in the constructor is
> > a significant overhead. I will cover how I avoid the look-up in the
> > constructor later. I have some techniques for avoiding the need to
> > always wrap the primitive but these are outside the scope of this
> > discussion.
>
> We have a centralized per-runtime class that gets passed to almost all
> calls, avoiding the cost of looking up metaclasses in a global hash or
> going to a threadlocal. Last year at this time, our single largest
> bottleneck was threadlocal lookups. Now it doesn't even register.

Hmmm. I must take a look at that.

>
> > I have a per thread unsynchronised HashMap which is used to look up
> > mappings. This is used as the first port of call when we want the
> > MetaClass for a non NgObject. Getting the Map from ThreadLocal storage
> > is expensive but the compiler can cache the thread context data
> > structure and incur that cost only once in a method.
>
> I'd be careful assuming what the compiler can or should do here. You
> have one benefit, in that Ng metaclasses can't be replaced. In JRuby we
> don't have that benefit, so no metaclass caching can be done in most
> cases. But there's perhaps a few larger issues here:

Although the MetaClass can't be replaced the MetaClass is really just
a wrapper round an inner MetaClass which can be replaced. So i cheat a
bit here:)

>
> - A map for every class to its metaclass for every thread that calls
> into the system seems like it could consume a lot of memory quickly.
> - You have that many more threads referencing those classes and
> metaclasses, which makes your GC and leak worries a lot more perplexing.
> - You have to make sure that all hits to any thread's metaclass map are
> synchronized, since another thread may be in the process of constructing
> the metaclass already. So you may be able to cache classes in each
> thread, but if there are many new threads entering the system they'll
> pay a severe cost accessing all those metaclasses for the first time.
> - Where will the compiler cache the metaclasses it finds? More
> references means more memory use, more places preventing metaclasses
> from being GCed, and so on.
>
> I'd like to hear more in this area.

The Thread local Maps are simple caches. If it becomes a problem then
I'll just adopt a more draconian approach to memory management.

Let me explain the scheme at more length...

In general, Ng Objects never appear in the Canonical MetaClass
registry nor in the per thread registry cache. Normally the MetaClass
is created when the class is first loaded and the constructor just
uses that instance to initialise the metaClass property. In some cases
this does not happen (ut I think I can fix that). In which case the
mapping appears in the per thread registry cache but not in the
Canonical MetaClass registry.

For POJOs the Canonical MetaClass registry is the place where the
metaclasses are created an held. It's expensive to access the
Canonical MetaClass registry but the thread local registry has to do
that only once at the first use of a POJO.

It's possible that this might cause a problem thread startup. It's a
matter which can only be determined by measurement. If it is a problem
then it would be possible to give the thread local registry a "starter
pack" of mappings when it is initially created.


Thanks


John Wison

John Wilson

unread,
Jun 4, 2007, 4:25:18 PM6/4/07
to jvm-la...@googlegroups.com

Samuele,

I'd really like myObject.getClass().getMyMetaData() to be a cheap way
of getting, in my case, the MetaClass and, in your case, the Python
type.

Is that what you has in mind?

John Wilson

Jochen Theodorou

unread,
Jun 4, 2007, 4:49:54 PM6/4/07
to jvm-la...@googlegroups.com
John Wilson schrieb:
[...]

> So if your class has n Closures do you generate n methods on the
> class? When you pass a Closure as a a parameter to a Java method, for
> example, do you have some kind of adapter object which forwards the
> call?
>
> I'm not completely sure that this would work on Ng (or Groovy, for
> that matter). As Closures are objects they can have their behaviour
> modified so they need to have a MetaClass associated with them

In Groovy we could have some kind of context object equal to the current
Closure object, but instead of calling doCall it would call the method
for the closure on the class. In the end there would be not less
objects, but less classes. As for local variables, they would not exist
as fields as in the current implementation they would given to the
method as "context". I don't think it would be too difficult to make
that change for Groovy.

[...]


>> I would strongly urge you to look at ConcurrentHashMap and friends.
>> We're using it extensively in JRuby for common data structures (like the
>> metaclass's map of method objects) and have been very happy with it.
>
> Is there a ConcurrentHashMap of WeakHashMap by any chance?

shouldn't be too difficult to make one based by reusing
ConcurrentHashMap and with some extra classes for the keys and values.

bye blackdrag

--
Jochen "blackdrag" Theodorou
Groovy Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

John Wilson

unread,
Jun 4, 2007, 4:48:30 PM6/4/07
to jvm-la...@googlegroups.com
On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>
> John Wilson wrote:
> > Charles, many thanks for the very detailed reply.
> >
> > I'm going to respond in chunks to avoid producing unmanageably large emails.
> >
> >
> > On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
> >> John Wilson wrote:
> >>> There seem to be two ways of adding dynamic magic to what are essentially POJOs.
> >> JRuby uses both of these techniques...boxing for non-Ruby Java types and
> >> "just" a metaclass for Ruby types. Of course the boxed Java objects
> >> still just use the same metaclass mechanism, since they then appear like
> >> normal Ruby objects. But there's overhead there, about 2X simply calling
> >> standard Ruby objects. That's going to be a post 1.0 area of focus.
> >
> > I box primitive arithmetic and logical types but I have methods on the
> > MetaClass which allow me to handle them unboxed in some circumstances.
>
> Using parameter-doubling I assume? We're planning this change for JRuby
> in the future, to allow the use of "lightweight" objects that don't have
> wrappers. Same basic principal, but in our case it's more to eliminate
> memory overhead (and to some extend call abstraction) than to speed
> anything up.

I'm not sure what "paramerter doubling is":)

Let me give you an example

a + b

results in

get MetaClass for a

call add(a, b) on the MetaClass for a.

a + 1

could result in (it is valid and it would work)

get the MetaClass for a

call add(a NgInt.valueOf(1))

but actually the compiler will generate
get MetaClass for a

call add(a, 1) on the MetaClass for a.

So their are many implementations of add() which allow unboxed
primitives to be passed as parameters.

This, of course, causes a combinatorial explosion and the MetaClass
has a very large number of methods. However each method is very simple
and the implementation can be generated from a script.

My initial very crude measurements show a significant improvement over
Groovy's arithmetic performance (groovy uses Integer as a wrapper for
int).

>
> > I box objects (both POJOs and NG Objects) when they are assigned to a
> > typed variable or cast to a type when passed as a parameter. There are
> > some circumstances (when the type is final) when I can avoid that but,
> > in general, using typed variables will slow the program down
> > (arithmetic and logical primitives are an exception to this rule).
>
> I've noticed this is the case in Groovy too. My first fib benchmarks
> were really slow until I removed all types. So even if Groovy does
> support some kind of optional static typing, you pay a major penalty for
> using it.

Yes Groovy imposes an overhead for primitive types as well because it
used Integer, etc as wrappers and has no spacial case code for
arithmetic on primitives.
It's a continual battle to teach people that typing data does not, in
general, make your program faster. it's not in the language to do
that. It's in the language to help you interface to Java.

>
> I'd be interested in knowing if there are any papers or techniques for
> optimizing optional-static-typed languages out there. I don't know of
> any, nor have I done much of a search.

I haven't seen anything. In a fully dynamic language knowing the type
doesn't help you because knowing the type doesn't tell you anything
about the behaviour because the behaviour is dynamic.
Ng seems to be able to benefit from primitive arithmetic typing but
it's very early days and I would not trumpet this as a real advantage
yet.


Yes and is seems to work OK.

John Wilson

John Wilson

unread,
Jun 4, 2007, 4:51:36 PM6/4/07
to jvm-la...@googlegroups.com
On 6/4/07, Jochen Theodorou <blac...@gmx.org> wrote:
>
> John Wilson schrieb:
> [...]
> > So if your class has n Closures do you generate n methods on the
> > class? When you pass a Closure as a a parameter to a Java method, for
> > example, do you have some kind of adapter object which forwards the
> > call?
> >
> > I'm not completely sure that this would work on Ng (or Groovy, for
> > that matter). As Closures are objects they can have their behaviour
> > modified so they need to have a MetaClass associated with them
>
> In Groovy we could have some kind of context object equal to the current
> Closure object, but instead of calling doCall it would call the method
> for the closure on the class. In the end there would be not less
> objects, but less classes. As for local variables, they would not exist
> as fields as in the current implementation they would given to the
> method as "context". I don't think it would be too difficult to make
> that change for Groovy.

Yes. In Groovy having fewwer Classes would be a benefit (fewer
metaClass instances). In Ng it's not such a benefit as all Claosures
use the same MetaClass instance and none ever appear in the Registry.

John Wilson

Charles Oliver Nutter

unread,
Jun 4, 2007, 5:21:04 PM6/4/07
to jvm-la...@googlegroups.com
John Wilson wrote:
>> In fact, I avoided a class-per-closure for a very specific reason:
>> constructing yet another object per closure. As it stands now, in JRuby
>> the only objects constructed per-call are the objects directly required
>> for that call like variable scope and call frame data structures.
>
> This is very interesting.
>
> So if your class has n Closures do you generate n methods on the
> class? When you pass a Closure as a a parameter to a Java method, for
> example, do you have some kind of adapter object which forwards the
> call?

The mapping goes like this:

AOT Compilation (one class generated per .rb file):
- One method generated for main entry point (Java main)
- One method generated for in-Ruby library loading
- One method generated for the straight through execution of the script
(the "top level" code)
- One method generated for each class body (since Ruby class bodies are
just code executed with a class object as "self")
- One method generated for each physical "block" closure
- top level, class body, and block closure methods can be bound to their
appropriate "callable" object via reflection or via tiny generated
stubs, one-per-method. I believe reflection is not currently 100%
working for AOT-compiled code, but only because demand is low.

JIT compilation (one class generated per JITable method):
- One method generated for the main execution of the method body,
generally in the same location as the top-level method for AOT-compiled code
- One method generated for each block closure in that method
- binding is done directly for for the method body itself, and via
reflection or generated stubs as above

So the class load is quite a bit greater for JIT mode, but necessarily
so...we don't know when methods will be JITed, and even if we could
batch them we still want each method to be individually GCable.

> I'm not completely sure that this would work on Ng (or Groovy, for
> that matter). As Closures are objects they can have their behaviour
> modified so they need to have a MetaClass associated with them

Just because Closures are objects doesn't mean you need to generate a
class-per...the body of code itself is still immutable (really, it's the
most static, immutable part of these languages...the smallest,
irreducible, atomic unit we can build upon), and if you need to wrap the
resulting "code object" with some custom object of your own there's no
reason you can't do that with a generic type. In Ruby, this happens when
you convert a "block" into a "proc" or "lambda". It becomes an object
with its own behavior and potentially its own state.

>> To address the GC issue, we generate a classloader-per-class for all
>> classes generated at runtime. It's gross, but it's currently the only
>> way to ensure the classes can GC, as you probably know.
>
> Yes but I can't really see any other way of dealing with this.

We need help from the JVM here.

>> I would strongly urge you to look at ConcurrentHashMap and friends.
>> We're using it extensively in JRuby for common data structures (like the
>> metaclass's map of method objects) and have been very happy with it.
>
> Is there a ConcurrentHashMap of WeakHashMap by any chance?

Probably somewhere :)

>> The strategy we've taken with JRuby, which is very dynamic all the time
>> (i.e. all classes are always open...existing objects can even gain new
>> methods or have new anonymous metaclasses inserted in their hierarchy)
>> is to just promise that the internal data structures governing JRuby
>> will remain structurally sound. There's a whole other raft of issues
>> with being able to modify classes across threads, and I honestly believe
>> it's not the responsibility of the language implementer to address this
>> issue. As with any shared data structure, we expect that the consumers
>> of the language will do their own synchronization to ensure they don't
>> step on their own threads. And perhaps that's the dividing line...if the
>> language consumer could reasonably be expected to do their own
>> synchronization...we don't do it for them.
>
> I certainly think that's a wise approach. However I think it's the
> runtime's responsibility to ensure that a method added to a class by
> one thread is immediately seen by all other threads, for example. Ng
> and Groovy both support this kind of open class (with a different
> language mechanism, of course). Groovy also has Categories which add
> functions to classes but only for the current thread and only for the
> duration of the execution of a closure. This is really very useful but
> quite hard to implement efficiently. I plan to do something like this
> in Ng. If i can manage it I'd also like any child threads spawned in
> the Category closure to inherit the methods.

In Ruby methods added to a class are instantly available to extra-thread
consumers of that class. There's no concept of per-thread class
modifications, even within a given scope. There are possible plans to
add selector namespaces to Ruby 2.0, but that's still some ways out.
We'll support them if Ruby 2.0 does, but for now class structure is
global state in Ruby. The best and only thing we can do is to ensure
multiple threads modifying class structure don't irreparably damage the
class itself. And that's as far as we go with it.

>>> I have a per thread unsynchronised HashMap which is used to look up
>>> mappings. This is used as the first port of call when we want the
>>> MetaClass for a non NgObject. Getting the Map from ThreadLocal storage
>>> is expensive but the compiler can cache the thread context data
>>> structure and incur that cost only once in a method.
>> I'd be careful assuming what the compiler can or should do here. You
>> have one benefit, in that Ng metaclasses can't be replaced. In JRuby we
>> don't have that benefit, so no metaclass caching can be done in most
>> cases. But there's perhaps a few larger issues here:
>
> Although the MetaClass can't be replaced the MetaClass is really just
> a wrapper round an inner MetaClass which can be replaced. So i cheat a
> bit here:)

Then I'd recommend avoiding caching in too many places if possible.
Eventually you either introduce nasty leaks or you spend more time
managing cache entries than running code. And if you're caching a
mostly-static metaclass that wraps a dynamic metaclass that provides
most behavior, you're not actually gaining anything.

There's also the JVM/John Rose angle on this sort of thing...the more
layers and abstractions and contortions between the call site and the
actual code, the less the JVM will be able to do to optimize the code.
To that end, we've been working to reduce the call-path depth and
complexity with every release. Currently, it's down to about four method
calls, not counting recursively walking the AST. When compiled, it's
only those four calls deep for every Ruby invocation.

> The Thread local Maps are simple caches. If it becomes a problem then
> I'll just adopt a more draconian approach to memory management.

I assume this design is largely in response to the overhead of looking
up metaclasses in Groovy. I've been getting the feeling that this is a
constant and painful bottleneck in Groovy right now. Perhaps your work
and research can help remedy it.

> Let me explain the scheme at more length...
>
> In general, Ng Objects never appear in the Canonical MetaClass
> registry nor in the per thread registry cache. Normally the MetaClass
> is created when the class is first loaded and the constructor just
> uses that instance to initialise the metaClass property. In some cases
> this does not happen (ut I think I can fix that). In which case the
> mapping appears in the per thread registry cache but not in the
> Canonical MetaClass registry.

Ditto for JRuby. The object itself has a method to get the metaclass
(called "getMetaClass", unsurprisingly...see how this stuff all ends up
in the same place?) so there's no metaclass caching needed.

The only difference for us is how we store the actual metaclasses
outside of instances. Since we are an implementation of Ruby, we managed
classes like Ruby does...as constants in some namespace. So in the
top-level namespace (which actually lives within Object's namespace) you
have constants for "String, "Array", etc that point to the appropriate
metaclasses. And of course in Ruby classes are just objects, so
metaclasses are just Objects.

A quick example is in order (review for some of you familiar with Ruby
or Smalltalk)...

The Array class extends Object. The Array class object, however, is an
instance of Class, which also extends Object. So to construct an array,
you need a reference to the Array class object. Ignoring literals for
the moment, this would be done by looking up the class object at its
"Array" constant and calling the "new" method on it.

myarray = Array.new
myarray << 'somevalue'

Within JRuby code, when a new Array instance needs to be constructed, we
construct a new RubyArray object, passing at least the following two values:

- a "Ruby" runtime object that says what JRuby runtime this object came from
- a "RubyClass" object that is our internal representation of the Array
class

And there's no way to construct a RubyArray without both of these
present. Ruby objects can't migrate from JRuby runtime to JRuby runtime.
Because the runtime object is always available, we always have a
reference to the top-level namespace, thread-local data, and so on. We
can also have multiple JRuby runtimes with their own classes,
namespaces, caches...all independent, even if the same threads are
calling through them.

>
> For POJOs the Canonical MetaClass registry is the place where the
> metaclasses are created an held. It's expensive to access the
> Canonical MetaClass registry but the thread local registry has to do
> that only once at the first use of a POJO.
>
> It's possible that this might cause a problem thread startup. It's a
> matter which can only be determined by measurement. If it is a problem
> then it would be possible to give the thread local registry a "starter
> pack" of mappings when it is initially created.

This is probably where we need to start sharing more. This is basically
the same way we call Java types, Groovy calls almost all types, Jython
calls Java types.

Ng's requirements are sounding largely like JRuby's requirements. I
think we need to talk more about finding commonality, pulling out some
utility libraries, and building a set of frameworks and runtimes that
solve the same problems for both of us.

- Charlie

Charles Oliver Nutter

unread,
Jun 4, 2007, 5:32:29 PM6/4/07
to jvm-la...@googlegroups.com
John Wilson wrote:
> On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>> Using parameter-doubling I assume? We're planning this change for JRuby
>> in the future, to allow the use of "lightweight" objects that don't have
>> wrappers. Same basic principal, but in our case it's more to eliminate
>> memory overhead (and to some extend call abstraction) than to speed
>> anything up.
>
> I'm not sure what "paramerter doubling is":)

Parameter doubling in this case would be having two parameters, one for
a more dynamic type and one for a more specific one. So if you have a
method that can accept anything but usually accepts an int, you would
have the following Java signature:

public void DynamicObject doSomething(DynamicObject argA, int argB)

...and then use some mechanism to know whether to use the dynamic
version of the object or not. Or a more concrete example, which will
likely come into play in JRuby soon:

public void Object someStringOperation(RubyString argA, String argB)


...thereby avoiding the overhead of constructing a RubyString when we
can avoid it.

> but actually the compiler will generate
> get MetaClass for a
>
> call add(a, 1) on the MetaClass for a.
>
> So their are many implementations of add() which allow unboxed
> primitives to be passed as parameters.
>
> This, of course, causes a combinatorial explosion and the MetaClass
> has a very large number of methods. However each method is very simple
> and the implementation can be generated from a script.
>
> My initial very crude measurements show a significant improvement over
> Groovy's arithmetic performance (groovy uses Integer as a wrapper for
> int).

Yes, I would expect to see this kind of improvement too. And I imagine
you're also considering such things as compiling

1 + 1

...as something like

1 + 1

...avoiding metaclass and method dispatch entirely. We also plan this
sort of optimization, to allow for "fast math" operations when we know
that the core class math operations have not been overridden. This will
probably come in JRuby 1.1 or 1.2, since we'll have more room to work on
performance after the 1.0 release. But in general our goal is twofold:

- make actual call signatures at the call site as specific and directas
possible
- make individual objects as lightweight and native as possible

So some combination of arithmetic optimization, parameter doubling, and
call site adapters will be needed to get us there.

>> I've noticed this is the case in Groovy too. My first fib benchmarks
>> were really slow until I removed all types. So even if Groovy does
>> support some kind of optional static typing, you pay a major penalty for
>> using it.
>
> Yes Groovy imposes an overhead for primitive types as well because it
> used Integer, etc as wrappers and has no spacial case code for
> arithmetic on primitives.
> It's a continual battle to teach people that typing data does not, in
> general, make your program faster. it's not in the language to do
> that. It's in the language to help you interface to Java.

Yes, I think that point has not been made clear enough to non-Groovy
users. They tend to think that Groovy is dynamically typed, but can be
statically typed "like Java" (i.e. they think it reverts to Java
behavior) on demand.

- Charlie

John Wilson

unread,
Jun 5, 2007, 12:12:14 PM6/5/07
to jvm-la...@googlegroups.com
On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>
> John Wilson wrote:
[snip]

> > I'm not completely sure that this would work on Ng (or Groovy, for
> > that matter). As Closures are objects they can have their behaviour
> > modified so they need to have a MetaClass associated with them
>
> Just because Closures are objects doesn't mean you need to generate a
> class-per...the body of code itself is still immutable (really, it's the
> most static, immutable part of these languages...the smallest,
> irreducible, atomic unit we can build upon), and if you need to wrap the
> resulting "code object" with some custom object of your own there's no
> reason you can't do that with a generic type. In Ruby, this happens when
> you convert a "block" into a "proc" or "lambda". It becomes an object
> with its own behavior and potentially its own state.

Yes this is a difference between Ruby and Groovy. Groovy does not
have the concept of block just closure. This leads to a problem in
Groovy as to the semantics of return and break in a closure which is
used as a block. I'm considering how to address this in Ng. The Ruby
way is attractive but it may be that Java will get Closures. When this
happens I really want Ng blocks/closures to be Java closures to allow
for interoperation. Fortunately i have other problems to solve at the
moment!

In Groovy we found per thread modification rather useful (or, at least I did!).

>
> >>> I have a per thread unsynchronised HashMap which is used to look up
> >>> mappings. This is used as the first port of call when we want the
> >>> MetaClass for a non NgObject. Getting the Map from ThreadLocal storage
> >>> is expensive but the compiler can cache the thread context data
> >>> structure and incur that cost only once in a method.
> >> I'd be careful assuming what the compiler can or should do here. You
> >> have one benefit, in that Ng metaclasses can't be replaced. In JRuby we
> >> don't have that benefit, so no metaclass caching can be done in most
> >> cases. But there's perhaps a few larger issues here:
> >
> > Although the MetaClass can't be replaced the MetaClass is really just
> > a wrapper round an inner MetaClass which can be replaced. So i cheat a
> > bit here:)
>
> Then I'd recommend avoiding caching in too many places if possible.
> Eventually you either introduce nasty leaks or you spend more time
> managing cache entries than running code. And if you're caching a
> mostly-static metaclass that wraps a dynamic metaclass that provides
> most behavior, you're not actually gaining anything.

Well I am gaining something. Looking up the MetaClass of a POJO is
quite expensive compared to getting the MetaClass of a Ng object. (is
the object an instance of NgObject, is it null, get the class and then
look the class up in a HashMap - it all takes cycles).

>
> There's also the JVM/John Rose angle on this sort of thing...the more
> layers and abstractions and contortions between the call site and the
> actual code, the less the JVM will be able to do to optimize the code.
> To that end, we've been working to reduce the call-path depth and
> complexity with every release. Currently, it's down to about four method
> calls, not counting recursively walking the AST. When compiled, it's
> only those four calls deep for every Ruby invocation.

Reducing call depth at the point of dispatch is important for me but
for a different reason.

I want short stack traces :)

With reflection dispatch I'm 3-4 deap at the moment. Pre compiled
dispatch will be slightly better I think.

>
> > The Thread local Maps are simple caches. If it becomes a problem then
> > I'll just adopt a more draconian approach to memory management.
>
> I assume this design is largely in response to the overhead of looking
> up metaclasses in Groovy. I've been getting the feeling that this is a
> constant and painful bottleneck in Groovy right now. Perhaps your work
> and research can help remedy it.

The Groovy MetaClass registry has had lots of hands on it over the
last four years and like any large and complex piece of work needs re
engineering. I believe that Jochen is working on this. The Ng
MetaClass registry design is greatly informed by the problems with the
current groovy implementation. (short stack traces and a single use of
instanceof in the entire implementation).

Also all the runtaime API is specified as interfaces rather than
concrete classes as in the current Groovy implementation. This, of
course, allows for altaernative impelementations which may become
important down the line.

>
> > Let me explain the scheme at more length...
> >
> > In general, Ng Objects never appear in the Canonical MetaClass
> > registry nor in the per thread registry cache. Normally the MetaClass
> > is created when the class is first loaded and the constructor just
> > uses that instance to initialise the metaClass property. In some cases
> > this does not happen (ut I think I can fix that). In which case the
> > mapping appears in the per thread registry cache but not in the
> > Canonical MetaClass registry.
>
> Ditto for JRuby. The object itself has a method to get the metaclass
> (called "getMetaClass", unsurprisingly...see how this stuff all ends up
> in the same place?) so there's no metaclass caching needed.
>
> The only difference for us is how we store the actual metaclasses
> outside of instances. Since we are an implementation of Ruby, we managed
> classes like Ruby does...as constants in some namespace. So in the
> top-level namespace (which actually lives within Object's namespace) you
> have constants for "String, "Array", etc that point to the appropriate
> metaclasses. And of course in Ruby classes are just objects, so
> metaclasses are just Objects.

I have a Singleton called NgSystem which allows you to get at the
metaClassRegistry and the MetaClasses for the promative types an
java.lang.Object so it's not too different.


This sounds attractive - it's really very early das for Ng - the
codebase is quite fluid as you might imagine.

I'll try and spend a little time looking at the JRuby codebase to see
if anything cathces my eye.

John Wilson

John Wilson

unread,
Jun 5, 2007, 12:22:56 PM6/5/07
to jvm-la...@googlegroups.com
On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>
> John Wilson wrote:
> > On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
> >> Using parameter-doubling I assume? We're planning this change for JRuby
> >> in the future, to allow the use of "lightweight" objects that don't have
> >> wrappers. Same basic principal, but in our case it's more to eliminate
> >> memory overhead (and to some extend call abstraction) than to speed
> >> anything up.
> >
> > I'm not sure what "paramerter doubling is":)
>
> Parameter doubling in this case would be having two parameters, one for
> a more dynamic type and one for a more specific one. So if you have a
> method that can accept anything but usually accepts an int, you would
> have the following Java signature:
>
> public void DynamicObject doSomething(DynamicObject argA, int argB)
>
> ...and then use some mechanism to know whether to use the dynamic
> version of the object or not. Or a more concrete example, which will
> likely come into play in JRuby soon:
>
> public void Object someStringOperation(RubyString argA, String argB)
>
>
> ...thereby avoiding the overhead of constructing a RubyString when we
> can avoid it.

I don't do that. I actually have multiple methods with the same name
but different combinations of parameter. This sounds and looks
horrendous. The MetaClass has a very large number of methods. However
it's not quite as horrible as it first appears. You end up with large
numbers of very simple methods which you don't have to write and
maintain by hand. It turns out you can programatically generate themas
that are all essentially the same code.

The advantage is that they are very fast.

I don't plan to do directly compile expressions. I'd rather like Ng to
stay completely dynamic. Ng is an adjunct language to Java so I would
expect classes or single methods which are hotspots to be recoded in
Java.

>
> >> I've noticed this is the case in Groovy too. My first fib benchmarks
> >> were really slow until I removed all types. So even if Groovy does
> >> support some kind of optional static typing, you pay a major penalty for
> >> using it.
> >
> > Yes Groovy imposes an overhead for primitive types as well because it
> > used Integer, etc as wrappers and has no spacial case code for
> > arithmetic on primitives.
> > It's a continual battle to teach people that typing data does not, in
> > general, make your program faster. it's not in the language to do
> > that. It's in the language to help you interface to Java.
>
> Yes, I think that point has not been made clear enough to non-Groovy
> users. They tend to think that Groovy is dynamically typed, but can be
> statically typed "like Java" (i.e. they think it reverts to Java
> behavior) on demand.

I think that's true - It's quite a subtle point and you can see why
people fall into the trap of thinking that way. I always try to
correct this misapprehension when talking about Groovy (and Ng). With
Ng you can gain performance using primitive types but not always and
not as much as you would think!

John Wilson

Charles Oliver Nutter

unread,
Jun 5, 2007, 2:04:40 PM6/5/07
to jvm-la...@googlegroups.com
John Wilson wrote:
> On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>> Just because Closures are objects doesn't mean you need to generate a
>> class-per...the body of code itself is still immutable (really, it's the
>> most static, immutable part of these languages...the smallest,
>> irreducible, atomic unit we can build upon), and if you need to wrap the
>> resulting "code object" with some custom object of your own there's no
>> reason you can't do that with a generic type. In Ruby, this happens when
>> you convert a "block" into a "proc" or "lambda". It becomes an object
>> with its own behavior and potentially its own state.
>
> Yes this is a difference between Ruby and Groovy. Groovy does not
> have the concept of block just closure. This leads to a problem in
> Groovy as to the semantics of return and break in a closure which is
> used as a block. I'm considering how to address this in Ng. The Ruby
> way is attractive but it may be that Java will get Closures. When this
> happens I really want Ng blocks/closures to be Java closures to allow
> for interoperation. Fortunately i have other problems to solve at the
> moment!

John Rose recently posted an entry about using exceptions for non-local
return, to get "correct" (in Neal Gafter's estimation) support for flow
control within closures. So for example, a 'return' within the block
passed to 'each' will actually return from the caller of 'each', rather
than from the block itself. This is how Ruby and JRuby work.

Anyway, in John's post, he showed that by stripping down the exception
itself, return performance is only a few times worse than a straight-up
Java return. I imagine this may depend on the depth of the stack being
unrolled, but it's good news for those of us already implementing
closures using exceptions for non-local flow control. JRuby already uses
this technique, as a lightweight "JumpException" type thrown for all
such non-local events.

In compiled mode, JRuby will also additionally just use the normal Java
bytecode for each type of flow control, avoiding the overhead of an
exception in cases where it is possible to do so.

I would recommend using the same approach for blocks in Ng, and perhaps
here's another framework that could be pulled out of our
implementations...a simple, standard mechanism for implementing closures
with non-local flow control.

>> In Ruby methods added to a class are instantly available to extra-thread
>> consumers of that class. There's no concept of per-thread class
>> modifications, even within a given scope. There are possible plans to
>> add selector namespaces to Ruby 2.0, but that's still some ways out.
>> We'll support them if Ruby 2.0 does, but for now class structure is
>> global state in Ruby. The best and only thing we can do is to ensure
>> multiple threads modifying class structure don't irreparably damage the
>> class itself. And that's as far as we go with it.
>
> In Groovy we found per thread modification rather useful (or, at least I did!).

I wonder if per-thread modification would be as useful if you could have
multiple Groovy runtimes being used by the same set of threads. That's
how we manage to deploy many Rails applications in a single JVM...they
each get their own JRuby runtime. Since Rails modifies a number of core
classes, this is the only way to host multiple different apps.

I guess I'd see per-thread modification to classes as being much more
complicated than it's worth. Of course if Ruby 2.0's selector namespaces
come along, we'll have to find an efficient way to support them.

>> Then I'd recommend avoiding caching in too many places if possible.
>> Eventually you either introduce nasty leaks or you spend more time
>> managing cache entries than running code. And if you're caching a
>> mostly-static metaclass that wraps a dynamic metaclass that provides
>> most behavior, you're not actually gaining anything.
>
> Well I am gaining something. Looking up the MetaClass of a POJO is
> quite expensive compared to getting the MetaClass of a Ng object. (is
> the object an instance of NgObject, is it null, get the class and then
> look the class up in a HashMap - it all takes cycles).

Ditto for us on the relative performance. I think this is pretty standard.

>> There's also the JVM/John Rose angle on this sort of thing...the more
>> layers and abstractions and contortions between the call site and the
>> actual code, the less the JVM will be able to do to optimize the code.
>> To that end, we've been working to reduce the call-path depth and
>> complexity with every release. Currently, it's down to about four method
>> calls, not counting recursively walking the AST. When compiled, it's
>> only those four calls deep for every Ruby invocation.
>
> Reducing call depth at the point of dispatch is important for me but
> for a different reason.
>
> I want short stack traces :)

You and me both...consider that we have a recursive interpreter too and
you can imagine our stacktraces are legendary.

>> I assume this design is largely in response to the overhead of looking
>> up metaclasses in Groovy. I've been getting the feeling that this is a
>> constant and painful bottleneck in Groovy right now. Perhaps your work
>> and research can help remedy it.
>
> The Groovy MetaClass registry has had lots of hands on it over the
> last four years and like any large and complex piece of work needs re
> engineering. I believe that Jochen is working on this. The Ng
> MetaClass registry design is greatly informed by the problems with the
> current groovy implementation. (short stack traces and a single use of
> instanceof in the entire implementation).
>
> Also all the runtaime API is specified as interfaces rather than
> concrete classes as in the current Groovy implementation. This, of
> course, allows for altaernative impelementations which may become
> important down the line.

We are going to be making a move toward a metaclass interface as well.
We should collaborate on a common interface; I think such a thing is
possible. That would in theory bridge 90% of the gap between Ng and Ruby
(and any other language that implements the same interface) since we
could use the same call-site dispatch logic to call across languages. So
here is another area where collaboration will work. We must do it.

>> Ng's requirements are sounding largely like JRuby's requirements. I
>> think we need to talk more about finding commonality, pulling out some
>> utility libraries, and building a set of frameworks and runtimes that
>> solve the same problems for both of us.
>
>
> This sounds attractive - it's really very early das for Ng - the
> codebase is quite fluid as you might imagine.
>
> I'll try and spend a little time looking at the JRuby codebase to see
> if anything cathces my eye.

I can imagine at the very least we could start pulling out some common
interfaces and adjusting our implementations to match them. That would
be a good start on our little JLR (JVM Language Runtime) venture :)

- Charlie

Charles Oliver Nutter

unread,
Jun 5, 2007, 2:13:35 PM6/5/07
to jvm-la...@googlegroups.com
John Wilson wrote:
> On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
> I don't do that. I actually have multiple methods with the same name
> but different combinations of parameter. This sounds and looks
> horrendous. The MetaClass has a very large number of methods. However
> it's not quite as horrible as it first appears. You end up with large
> numbers of very simple methods which you don't have to write and
> maintain by hand. It turns out you can programatically generate themas
> that are all essentially the same code.
>
> The advantage is that they are very fast.

I plan to do this dynamically at runtime, depending on the actual types
of parameters...informed by a John Rose suggestion.

> I don't plan to do directly compile expressions. I'd rather like Ng to
> stay completely dynamic. Ng is an adjunct language to Java so I would
> expect classes or single methods which are hotspots to be recoded in
> Java.

Here's an area where I'll strongly disagree...I think encouraging people
to rewrite hotspots in Java should be an absolute last resort. Why?
Because in general, I'd rather write in Ruby or Ng or Groovy. I think
encouraging people to rewrite Ng hotspots in Java only leads to one
thing: different Ng hotspots. Encourage people to profile their hotspots
and figure out why they're slow...then fix it. If everyone that runs
into a performance problem immediately rewrites that code in Java, it
hurts me as a Ruby or Ng or Groovy developer because it takes the
pressure off JRuby or Ng or Groovy implementers to improve performance.
I want to keep the pressure on :)

The "write it in C" attitude prevails in the Ruby world too, and as a
result there's practically no pressure on the core Ruby team to improve
performance. I say "write it in Ruby, and help make Ruby faster".

Note that I certainly don't advocate using a dynlang for
everything...but I really despise the knee-jerk reaction some folks have
to abandoning a slow dynlang for a fast static lang. There's no reason
we shouldn't be able to make dynlangs screaming fast, so let's focus on
that first. Let's not encourage people to jump ship at the slightest
hint of a performance problem.

- Charlie

John Wilson

unread,
Jun 5, 2007, 3:02:53 PM6/5/07
to jvm-la...@googlegroups.com
On 6/5/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>
> John Wilson wrote:
> > On 6/4/07, Charles Oliver Nutter <charles...@sun.com> wrote:
> >> Just because Closures are objects doesn't mean you need to generate a
> >> class-per...the body of code itself is still immutable (really, it's the
> >> most static, immutable part of these languages...the smallest,
> >> irreducible, atomic unit we can build upon), and if you need to wrap the
> >> resulting "code object" with some custom object of your own there's no
> >> reason you can't do that with a generic type. In Ruby, this happens when
> >> you convert a "block" into a "proc" or "lambda". It becomes an object
> >> with its own behavior and potentially its own state.
> >
> > Yes this is a difference between Ruby and Groovy. Groovy does not
> > have the concept of block just closure. This leads to a problem in
> > Groovy as to the semantics of return and break in a closure which is
> > used as a block. I'm considering how to address this in Ng. The Ruby
> > way is attractive but it may be that Java will get Closures. When this
> > happens I really want Ng blocks/closures to be Java closures to allow
> > for interoperation. Fortunately i have other problems to solve at the
> > moment!
>
> John Rose recently posted an entry about using exceptions for non-local
> return, to get "correct" (in Neal Gafter's estimation) support for flow
> control within closures. So for example, a 'return' within the block
> passed to 'each' will actually return from the caller of 'each', rather
> than from the block itself. This is how Ruby and JRuby work.

Yes, John proposed something very similar when he was working on
Groovy a couple of years ago.

I'm sure a standard way will emerge if and when Closures make it into
Java. As soon as somebody decides which proposal is to be adopted and
the detailed design work is done I'll just steal it for Ng. It's
important that the implementations align as closely as possible.

Ng can already support multiple runtimes. It's just a matter of using
standard Classloader magic to load multiple instances of the NgSystem
"singleton" so that the classes loaded by the root classloader or it's
children see different NgSystem instances. It's probably possible to
do this for groovy as well though it's more messy as there are several
classes that kind of provide the runtime services (lots of different
XXXadaptors and Invoker, InvokerHelper, etc).

>
> I guess I'd see per-thread modification to classes as being much more
> complicated than it's worth. Of course if Ruby 2.0's selector namespaces
> come along, we'll have to find an efficient way to support them.

I'm rather nervous of unrestricted open classes (even though the Ng
runtime supports it).

Neal Ford wrote a bit about this
http://memeagora.blogspot.com/2007/05/are-open-classes-evil.html#2593062978220247339

The comments are worth reading - there's an obligatory "it's OK in
Ruby because we Ruby users are the top 20% of the programming talent"
(I know you're not all like that, but it's great to have one's
prejudices confirmed from time to time!).

One of my main target audiences for Ng is the top 100% of the non
professional programmer market (I admire how the Python community
manages to mix programmers and non programmers). I need to find some
way of keeping the benefits of open classes whist stopping inadvertent
interference.

Multiple runtimes sounds nice but the storage overhead worries me.
MetaClasses are not lightweight objects.

The interfaces are all in the SVN repository. Pretty much everything
in ng.runtime is an interface the implementations are all in
uk.co.wilson.ng.*

MetaClass is basically the MOP - user programs can rely on it.

RuntimeMetaClass is a subclass of MetaClass which the compile can rely
on. It can be changed in a breaking way in which case you will have to
recompile your programs (like the version number in a class file)

InternalMetaClass provides and interface which allows MetaClasses to
manipulate each other. It's out of bounds for normal users and the
compiler. It can be changed in breaking ways but this should not
matter as you will just get a new runtime system and you will not have
to recompile your code. Users writing thier own custom MetaClasses to
implement higher order magic can use this but they will have to manage
the breakage.


>
> >> Ng's requirements are sounding largely like JRuby's requirements. I
> >> think we need to talk more about finding commonality, pulling out some
> >> utility libraries, and building a set of frameworks and runtimes that
> >> solve the same problems for both of us.
> >
> >
> > This sounds attractive - it's really very early das for Ng - the
> > codebase is quite fluid as you might imagine.
> >
> > I'll try and spend a little time looking at the JRuby codebase to see
> > if anything cathces my eye.
>
> I can imagine at the very least we could start pulling out some common
> interfaces and adjusting our implementations to match them. That would
> be a good start on our little JLR (JVM Language Runtime) venture :)

Have a look at the Ng *MetaClass interfaces and see how revolted you are :)

John Wilson

John Wilson

unread,
Jun 8, 2007, 7:56:02 AM6/8/07
to jvm-la...@googlegroups.com


I have done a little bit of experimenting and measuring to see what
the effect of using primitive types vs untyped variables in Ng.

I basically hand compiled the Java Mandlebrot benchmark from
http://www.timestretch.com/FractalBenchmark.html into the code that
should be emitted from the Ng compiler (i.e. just a series of calls to
the Ng Metaclass.).

The first implementation used typed variables just like the Java code.
This resulted in quite a lot of conversion of results of operations
back to primitive values.

I then converted that code to represent what would be generated if
none of the variables were typed. This resulted in considerably
simpler code as no boxing/unboxing was performed.

Note that in both cases the resulting program is fully dynamic (i.e.
the behaviour of the operations performed on the values could be
changed dynamically at runtime).

Much to my surprise the typed version was considerably quicker than
the untyped one.

Original Java version: 0.17 seconds
Typed Ng Version : 6.4 seconds
Untyped Ng Version: 19.4 seconds

(test run inside an IDE - not to be taken as good measurements)

I expected the untyped version to be slower but not by a factor of 3.

This is a rather special case - Ng will take advantage of typing
information when primitive (and BigInteger and BigDecimal) values are
used in arithmetic and logical operations. There are type specific
method in the Metaclass which provide a fast path for such operations.

I would expect Groovy to exhibit the opposite behaviour as the Groovy
Metaclass has no special case methods for primitives (I have not tried
this - Groovy's arithmetic rules promote the results floating
operations to double in all cases so it's difficult to put together an
untyped Groovy version which provides a fair comparison).


John Wilson

Reply all
Reply to author
Forward
0 new messages