[jvm-l] concurrency problems, is the JVM missing something here?

158 views
Skip to first unread message

Jochen Theodorou

unread,
Apr 20, 2010, 4:19:17 AM4/20/10
to jvm-la...@googlegroups.com
Hi all,

since quite a while I am looking for the best approach to tackle the
Groovy speed problem, but it seems I will not be able to fully solve the
problem without solving my concurrency problem.

My problem is more or less the following. A method call is done using
the data of a meta class, which is mostly like a caching structure, that
can be recreated unless certain conditions are met, that we don't need
to know for now. For each method call I need to check on this meta
class. And here part of the problem starts. How to get that object?

If it is stored in a volatile, ThreadLocal or even retrieved through a
data structure that uses synchornized, I get into problems with hotspot.

I found for me the ideal way to do this and other things would be to be
able to write an fully initialized object to main memory and other
threads can but don't have to get that value. They get knowledge of it
either as part of other synchronization, or by a user forced
instruction. The advantage is that does not have to happen on each read,
like with volatile. In theory ThreadLocal could be good enough, if there
were no speed issue. True ThreadLocal has been improved big times over
the years, but it is still slower as for example volatile, because I
cannot store the ThreadLocal object for long. So I would still have to
get a ThreadLocal for each method invocation and that is bitter.

As I see it I can either use native code and maybe cause problems for
the JVM, or I am on a completely wrong track here and I simply see
something not.

So I am asking here for ideas. Anyone having one?

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

--
You received this message because you are subscribed to the Google Groups "JVM Languages" group.
To post to this group, send email to jvm-la...@googlegroups.com.
To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.

Brian Hurt

unread,
Apr 20, 2010, 12:30:38 PM4/20/10
to jvm-la...@googlegroups.com
On Tue, Apr 20, 2010 at 4:19 AM, Jochen Theodorou <blac...@gmx.org> wrote:
Hi all,

since quite a while I am looking for the best approach to tackle the Groovy speed problem, but it seems I will not be able to fully solve the problem without solving my concurrency problem.

My problem is more or less the following. A method call is done using the data of a meta class, which is mostly like a caching structure, that can be recreated unless certain conditions are met, that we don't need to know for now. For each method call I need to check on this meta class. And here part of the problem starts. How to get that object?

If it is stored in a volatile, ThreadLocal or even retrieved through a data structure that uses synchornized, I get into problems with hotspot.

I found for me the ideal way to do this and other things would be to be able to write an fully initialized object to main memory and other threads can but don't have to get that value. They get knowledge of it either as part of other synchronization, or by a user forced instruction. The advantage is that does not have to happen on each read, like with volatile. In theory ThreadLocal could be good enough, if there were no speed issue. True ThreadLocal has been improved big times over the years, but it is still slower as for example volatile, because I cannot store the ThreadLocal object for long. So I would still have to get a ThreadLocal for each method invocation and that is bitter.

As I see it I can either use native code and maybe cause problems for the JVM, or I am on a completely wrong track here and I simply see something not.

So I am asking here for ideas. Anyone having one?

bye Jochen


Have you considered using java.util.concurrent.atomic.AtomicReference variables? 

I'm thinking of a design something like this: every object of a class has a reference to the AtomicReference that holds its class definition.  All objects of the same class all point to the same AtomicReference (this is why there is a double indirection).  Getting the current class definition compiles down into just two memory accesses- not perfect, but not bad.  When the class definition is changed, the change updates the atomic reference, which updates all the classes simultaneously.

I'm not sure what behavior you want, but something like this might work.

Brian
 

Peter Veentjer

unread,
Apr 20, 2010, 12:59:02 PM4/20/10
to jvm-la...@googlegroups.com
An AtomicReference is not going to outperform a volatile variable. So
if a volatile variable is a no go, an atomic reference certainly is a
no go.

And if there are multiple classes having their own class definitions,
making all the new class definitions appears all at once, is going to
be problematic.
So if possible I would certainly look at how strict this behaviour needs to be.

Patrick Wright

unread,
Apr 20, 2010, 1:06:52 PM4/20/10
to jvm-la...@googlegroups.com
> My problem is more or less the following. A method call is done using the
> data of a meta class, which is mostly like a caching structure, that can be
> recreated unless certain conditions are met, that we don't need to know for
> now. For each method call I need to check on this meta class. And here part
> of the problem starts. How to get that object?
>
> If it is stored in a volatile, ThreadLocal or even retrieved through a data
> structure that uses synchornized, I get into problems with hotspot.
>
> I found for me the ideal way to do this and other things would be to be able
> to write an fully initialized object to main memory and other threads can
> but don't have to get that value. They get knowledge of it either as part of
> other synchronization, or by a user forced instruction. The advantage is
> that does not have to happen on each read, like with volatile. In theory
> ThreadLocal could be good enough, if there were no speed issue. True
> ThreadLocal has been improved big times over the years, but it is still
> slower as for example volatile, because I cannot store the ThreadLocal
> object for long. So I would still have to get a ThreadLocal for each method
> invocation and that is bitter.

I'm not sure I'm clear on the ownership of the metadata. If Thread-A
modifies the metadata, must this be "immediately" visible to Thread-B
(and C and D...)? If the meta data is, essentially, global across
threads, then any change made by one thread has to be propagated
(flushed) back to main memory and then pulled back up into the
thread/CPU local cache, using write and read barriers, if I understand
correctly. I don't know any way around that; conceptually, as I
understand it, individual threads and cores may/will have their own
local, detached copy of whatever they need from main memory. At some
point that copy and the later re-synchronization has to take place.
That sort of data sharing--having a consistent view of mutable state
across threads--is simply expensive on some level (for some value of
expensive).

If the given metadata is considered fixed/static for the execution of
a thread, and you don't want the cost of a thread local, then you
could probably weave it into the stack as a (to the user, hidden)
method parameter, no? Then you would avoid the indirection of the
thread local lookup. In the end, I think of a TL as a way to access
data across method calls within a single thread of execution when the
value is not available on the stack, like a "thread global" variable
(haha). If you're actually writing the compiler, though, you might be
able to do this sort of extension of the parameter list automatically.


Patrick

Charles Oliver Nutter

unread,
Apr 20, 2010, 1:08:00 PM4/20/10
to jvm-la...@googlegroups.com
On Tue, Apr 20, 2010 at 3:19 AM, Jochen Theodorou <blac...@gmx.org> wrote:
> since quite a while I am looking for the best approach to tackle the Groovy
> speed problem, but it seems I will not be able to fully solve the problem
> without solving my concurrency problem.
>
> My problem is more or less the following. A method call is done using the
> data of a meta class, which is mostly like a caching structure, that can be
> recreated unless certain conditions are met, that we don't need to know for
> now. For each method call I need to check on this meta class. And here part
> of the problem starts. How to get that object?

Can you explain in terms of the Groovy codebase what you mean by "the
Groovy speed problem"? I may be able to explain what we have done in
JRuby for similar situations.

- Charlie

Jochen Theodorou

unread,
Apr 20, 2010, 1:15:21 PM4/20/10
to jvm-la...@googlegroups.com
Brian Hurt wrote:
[...]
> Have you considered using java.util.concurrent.atomic.AtomicReference
> variables?
>
> I'm thinking of a design something like this: every object of a class
> has a reference to the AtomicReference that holds its class definition.
> All objects of the same class all point to the same AtomicReference
> (this is why there is a double indirection). Getting the current class
> definition compiles down into just two memory accesses- not perfect, but
> not bad. When the class definition is changed, the change updates the
> atomic reference, which updates all the classes simultaneously.
>
> I'm not sure what behavior you want, but something like this might work.

ok, but what do you do with objects from classes you have no control over?

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Jochen Theodorou

unread,
Apr 20, 2010, 1:25:52 PM4/20/10
to jvm-la...@googlegroups.com
Patrick Wright wrote:
[...]
> I'm not sure I'm clear on the ownership of the metadata. If Thread-A
> modifies the metadata, must this be "immediately" visible to Thread-B
> (and C and D...)?

no. I could even imagine having the user do something special to make it
visible. So neither "immediately" nor "automatic" is not an absolute
requirement for me.

[...]
> If the given metadata is considered fixed/static for the execution of
> a thread, and you don't want the cost of a thread local, then you
> could probably weave it into the stack as a (to the user, hidden)
> method parameter, no? Then you would avoid the indirection of the
> thread local lookup.

In Groovy you constantly cross from Groovy domain into Java domain and
back. Groovy is not a layer on top of Java, it sits more side by side.
As such I of course loose this bonus as soon as I have to call a Java
method or as soon as Java calls a Groovy method. This would certainly be
good enough to fool some benchmarks, but I will have to generate a lot
of additional methods (the way with the additional parameter). Actually
I was going to say this cannot work, but now, that I think about it
again, maybe the penalties are not as big as I think - and Groovy to
Groovy method calling would be quite fast then. It is still not the
maximum possible speed of course.

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Jochen Theodorou

unread,
Apr 20, 2010, 3:48:19 PM4/20/10
to jvm-la...@googlegroups.com
Rich Hickey wrote:
> I think the problem he's described is a general one, and the true #1
> problem for dynamic languages on the JVM (vs the stuff JSR 292 deals
> with), and will remain a problem for JSR 292.

I think that is not yet really on scope for JSR 292. I started this
thread here partially because I wanted to know if it should be made more
known there. If enough people agree on that, maybe we can bring that on
the mlvm.

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Rich Hickey

unread,
Apr 20, 2010, 3:37:00 PM4/20/10
to JVM Languages


On Apr 20, 1:08 pm, Charles Oliver Nutter <head...@headius.com> wrote:
> On Tue, Apr 20, 2010 at 3:19 AM, Jochen Theodorou <blackd...@gmx.org> wrote:
> > since quite a while I am looking for the best approach to tackle the Groovy
> > speed problem, but it seems I will not be able to fully solve the problem
> > without solving my concurrency problem.
>
> > My problem is more or less the following. A method call is done using the
> > data of a meta class, which is mostly like a caching structure, that can be
> > recreated unless certain conditions are met, that we don't need to know for
> > now. For each method call I need to check on this meta class. And here part
> > of the problem starts. How to get that object?
>
> Can you explain in terms of the Groovy codebase what you mean by "the
> Groovy speed problem"? I may be able to explain what we have done in
> JRuby for similar situations.
>
> - Charlie
>

I think the problem he's described is a general one, and the true #1
problem for dynamic languages on the JVM (vs the stuff JSR 292 deals
with), and will remain a problem for JSR 292.

Being dynamic, we all support some notion of:

defining some function/method foo
defining some function/method bar that calls foo
allowing the user to redefine foo, either via repl interaction or
runtime code loading
users expecting bar to use the new definition of foo

Thus we all, being in userland, have to go through a volatile or worse
(on every call) in order to make such changes and ensure they are
visible to all threads.

It seems to me like we all could use some sort of global, cache-
flushing safepoint call to make after these code updates (since they
are so infrequent, especially in production), so our normal call paths
could avoid the overhead (and inlining suppression) of volatiles.

If you have any clever way around this in JRuby, please share.

Thanks,

Rich

Jochen Theodorou

unread,
Apr 20, 2010, 3:21:12 PM4/20/10
to jvm-la...@googlegroups.com
Charles Oliver Nutter wrote:
[...]
> We do want to move away from having the wrappers, however, since they
> can sometimes interfere with object identity and certainly increase
> the memory footprint of Java objects flowing through Rubyspace. In
> that case we would have the same challenge as you to recover the
> metaclass on each call. My strategy would most likely be to build in
> per-site polymorphic caches that can hold references to previous
> metaclasses they have seen, allowing for a simple identity check. I
> assume you've done this already in your inline caching.

yes, the "call site" holds the meta class for the java object.

> All classes in JRuby are mutable, which means every call must perform
> a check to see if their cache is now stale. In JRuby, this boils down
> to caching a tuple of [token, method] where the token is just an
> Object instance. The token is compared with that on the incoming class
> to know whether it's the same class in the same state as the previous
> call. Failure means the tuple is dumped and re-cached. On the
> metaclass side, changes to one metaclass flip the token of that class
> but also flip the tokens of all descendants.

ok, but I doubt you can do that multithreaded and without synchronization.

> Ultimately this means for a successfully cached call site in JRuby,
> the cost is only a volatile read plus an object comparison (not
> counting non-volatile fields to get the cache tuple and the token and
> method, of course). It's pretty small.

small, yes. But as I talked about at the language summit, even using a
volatile can cause hotspot to have big problems. Not to imagine multiple
threads working on the same volatile for each method invocation. Not
only does hotspot have to handle much bigger code than usual (there is
probably no real solution to this) but the usage of volatiles makes it
difficult for him to move code.

>> Even if we had method handles (Groovy is based on Java 1.5 atm), I would
>> still have to do things like checking if there is an active category, if the
>> metaclass is still the same and not mutated (ExpandoMetaClass allows
>> mutation)
>>
>> For the category stuff it might be interesting to look at
>> org.codehaus.groovy.runtime.GroovyCategorySupport#hasCategoryInCurrentThread,
>> which uses an AtomicInteger in the first place and if that check fails we
>> have to get a ThreadLocal storing the real category information.
>
> The solution that comes to mind for me is that all calls should bounce
> through some "invoker" that is per-thread. The invoker would be a
> non-category invoker normally and once a category is active the
> threadlocal invoker would be replaced with a category-aware invoker.
> Because no other threads would need to see the per-thread invoker, all
> access to it would be nonvolatile, and you would avoid the cost of
> checking for an installed category on every call.
>
> Of course I must admit I don't know all the details of categories.

the problem is more about how to realize the threadlocal invoker. I have
not found a way to do this unless I use ThreadLocal, which has issues on
its own.

>> org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite#checkPojoMetaClass()
>> contains the check we do for a cached java method. Something similar is done
>> for method calls on groovy objects.
>
> Yes, this will be interesting for us once we eliminate the wrappers
> around Java object. For the moment, it appears the wrappers make it
> about twice as expensive for a Java object to enter Ruby in JRuby as
> it would cost for the same call to happen in Groovy.

actually I am not sure about the direction you talk here about. This is
for calling a Java method on a Java object from Groovy. So the direction
is from Groovy into Java. The cost for Java to call a Groovy method the
static way is the same as for any other Java based method, since this
works completely static. Only if you want the dynamic stuff, you have to
use InvokerHelper and here it gets quite expensive, because here of
course no call site caching is done.

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Jochen Theodorou

unread,
Apr 20, 2010, 1:52:49 PM4/20/10
to jvm-la...@googlegroups.com
Charles Oliver Nutter wrote:
[...]
> Can you explain in terms of the Groovy codebase what you mean by "the
> Groovy speed problem"? I may be able to explain what we have done in
> JRuby for similar situations.

I spend uite some time to get an overview and counted about 20 points
were concurrent stuff is involved on the first call of a method. If the
method is called again, this is reduced to some volatile access to see
if a category is active and some other things. For example to get the
metaclass. In case of a GroovyObject this is easy, since the Object
itself stores the meta class. In the case of a Java based Object we need
to get the metaclass from the global registry, which is more or less a
concurrent map. In the best case this is will involve access to a
volatile too.

Even if we had method handles (Groovy is based on Java 1.5 atm), I would
still have to do things like checking if there is an active category, if
the metaclass is still the same and not mutated (ExpandoMetaClass allows
mutation)

For the category stuff it might be interesting to look at
org.codehaus.groovy.runtime.GroovyCategorySupport#hasCategoryInCurrentThread,
which uses an AtomicInteger in the first place and if that check fails
we have to get a ThreadLocal storing the real category information.

org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite#checkPojoMetaClass()
contains the check we do for a cached java method. Something similar is
done for method calls on groovy objects.

I just noticed that I can remove that usage AtomicInteger... I guess
when Alex implemented that he had a a different usage in mind. Of course
all this callsite stuff could be mostly replaced with MethodHandles, but
we would still have to check the categories, normally.

not sure this really helps you

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Jochen Theodorou

unread,
Apr 20, 2010, 1:56:25 PM4/20/10
to jvm-la...@googlegroups.com
Jochen Theodorou wrote:
[...]
> I just noticed that I can remove that usage AtomicInteger...

ah, I did oversee this is actually the Category stuff, so it is needed.

Charles Oliver Nutter

unread,
Apr 20, 2010, 2:41:12 PM4/20/10
to jvm-la...@googlegroups.com
On Tue, Apr 20, 2010 at 12:52 PM, Jochen Theodorou <blac...@gmx.org> wrote:
> I spend uite some time to get an overview and counted about 20 points were
> concurrent stuff is involved on the first call of a method. If the method is
> called again, this is reduced to some volatile access to see if a category
> is active and some other things. For example to get the metaclass. In case
> of a GroovyObject this is easy, since the Object itself stores the meta
> class. In the case of a Java based Object we need to get the metaclass from
> the global registry, which is more or less a concurrent map. In the best
> case this is will involve access to a volatile too.

The equivalent case in JRuby would be when a Java object enters "Ruby
space", at which time we have to look up its metaclass. However in
JRuby all such objects are then wrapped in a thin wrapper that holds a
hard reference to the metaclass. Subsequent calls then have immediate
non-volatile field access to the metaclass for the object, and the
only remaining cost is picking the right method overload (which also
gets cached based on incoming argument types).

We do want to move away from having the wrappers, however, since they
can sometimes interfere with object identity and certainly increase
the memory footprint of Java objects flowing through Rubyspace. In
that case we would have the same challenge as you to recover the
metaclass on each call. My strategy would most likely be to build in
per-site polymorphic caches that can hold references to previous
metaclasses they have seen, allowing for a simple identity check. I
assume you've done this already in your inline caching.

All classes in JRuby are mutable, which means every call must perform
a check to see if their cache is now stale. In JRuby, this boils down
to caching a tuple of [token, method] where the token is just an
Object instance. The token is compared with that on the incoming class
to know whether it's the same class in the same state as the previous
call. Failure means the tuple is dumped and re-cached. On the
metaclass side, changes to one metaclass flip the token of that class
but also flip the tokens of all descendants.

Ultimately this means for a successfully cached call site in JRuby,
the cost is only a volatile read plus an object comparison (not
counting non-volatile fields to get the cache tuple and the token and
method, of course). It's pretty small.

> Even if we had method handles (Groovy is based on Java 1.5 atm), I would
> still have to do things like checking if there is an active category, if the
> metaclass is still the same and not mutated (ExpandoMetaClass allows
> mutation)
>
> For the category stuff it might be interesting to look at
> org.codehaus.groovy.runtime.GroovyCategorySupport#hasCategoryInCurrentThread,
>  which uses an AtomicInteger in the first place and if that check fails we
> have to get a ThreadLocal storing the real category information.

The solution that comes to mind for me is that all calls should bounce
through some "invoker" that is per-thread. The invoker would be a
non-category invoker normally and once a category is active the
threadlocal invoker would be replaced with a category-aware invoker.
Because no other threads would need to see the per-thread invoker, all
access to it would be nonvolatile, and you would avoid the cost of
checking for an installed category on every call.

Of course I must admit I don't know all the details of categories.

> org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite#checkPojoMetaClass()
> contains the check we do for a cached java method. Something similar is done
> for method calls on groovy objects.

Yes, this will be interesting for us once we eliminate the wrappers
around Java object. For the moment, it appears the wrappers make it
about twice as expensive for a Java object to enter Ruby in JRuby as
it would cost for the same call to happen in Groovy.

- Charlie

Jochen Theodorou

unread,
Apr 20, 2010, 4:09:19 PM4/20/10
to jvm-la...@googlegroups.com
John Wilson wrote:
[...]
> Categories are thread specific so you don't need volatile access to see
> if a Category has been applied.
>
> You do need volatile to see if a global change has been made to the
> behaviour of a class and I can't see any way round that.

I partially agree. But what do I do if a non initialized object becomes
visible because no volatile was used for that object? Isn't that part of
the double checked locking idiom problem?

But thinking about your suggestion it is most probably overkill to use
an AtomicInteger for the category usage counter. A simple counter should
do, since that is only a integer.

But doing that change seems not to have any effect yet

John Wilson

unread,
Apr 20, 2010, 2:56:01 PM4/20/10
to jvm-la...@googlegroups.com
On 20 April 2010 18:52, Jochen Theodorou <blac...@gmx.org> wrote:
Charles Oliver Nutter wrote:
[...]

Can you explain in terms of the Groovy codebase what you mean by "the
Groovy speed problem"? I may be able to explain what we have done in
JRuby for similar situations.

I spend uite some time to get an overview and counted about 20 points were concurrent stuff is involved on the first call of a method. If the method is called again, this is reduced to some volatile access to see if a category is active and some other things.

 [snip]
 

Categories are thread specific so you don't need volatile access to see if a Category has been applied.

You do need volatile to see if a global change has been made to the behaviour of a class and I can't see any way round that.

John Wilson

David Schlosnagle

unread,
Apr 20, 2010, 11:56:21 PM4/20/10
to jvm-la...@googlegroups.com
Rich and Jochen,

Part of your needs might be met by Doug Lea's proposed Fences API [1]
[2] for JDK7 by allowing an explicit volatile write (with the
appropriate memory barriers), without incurring the overhead of a
volatile read on each access. This fence could be used when a new
method/function is registered, while all the lookups would be regular
reference field accesses. The recent discussion of Fences and impacts
to the Atomic* classes [3] might also be of interest as well since
potential updates to the atomic field updaters could help build the
functionality you desire.

[1]: http://cs.oswego.edu/pipermail/concurrency-interest/2009-January/005743.html
[2]: http://g.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/atomic/Fences.html
[3]: http://cs.oswego.edu/pipermail/concurrency-interest/2010-April/006990.html

Thanks,
Dave

Jochen Theodorou

unread,
Apr 21, 2010, 10:43:40 AM4/21/10
to jvm-la...@googlegroups.com
David Schlosnagle wrote:
> Rich and Jochen,
>
> Part of your needs might be met by Doug Lea's proposed Fences API [1]
> [2] for JDK7 by allowing an explicit volatile write (with the
> appropriate memory barriers), without incurring the overhead of a
> volatile read on each access. This fence could be used when a new
> method/function is registered, while all the lookups would be regular
> reference field accesses. The recent discussion of Fences and impacts
> to the Atomic* classes [3] might also be of interest as well since
> potential updates to the atomic field updaters could help build the
> functionality you desire.
>
> [1]: http://cs.oswego.edu/pipermail/concurrency-interest/2009-January/005743.html
> [2]: http://g.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/atomic/Fences.html
> [3]: http://cs.oswego.edu/pipermail/concurrency-interest/2010-April/006990.html

Those Fences sure look interesting to me. An explicit volatile write is
what I was looking for, so of course I am now excited to hear that
something like this is underway.

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Christian Vest Hansen

unread,
Apr 21, 2010, 1:10:06 PM4/21/10
to jvm-la...@googlegroups.com
Although I am not sure about the accuracy of the information, this
article claims that the only performance penalty of a volatile read on
x86 is the loss of optimization possibilities:
http://www.infoq.com/articles/memory_barriers_jvm_concurrency

With the Atomic* classes, the method lazySet (since 1.6) should be
equal to setting a final field but without the only-once restriction.
I interpret this to mean that writes in this thread that occur prior
to the lazySet call, will happen-before other threads see the value
from lazySet, whenever that is. I think the difference is that there
is no happens-before relationship between the lazySet and subsequent
reads in other threads.

This isn't the volatile-write/non-volatile-read scenario, but to me
the effect looks similar: the delayed visibility could just as well
have been caused by an optimization of a non-volatile read. And
although it seems consensus have yet to be reached, functionality for
non-volatile appears to probably be coming to the Atomic*s in 1.7, in
some form.

But lazySet isn't terribly well documented and my understanding may be
flawed, so I recommend getting a second opinion from the
concurrency-interest list.
--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Jochen Theodorou

unread,
Apr 21, 2010, 2:01:52 PM4/21/10
to jvm-la...@googlegroups.com
Christian Vest Hansen wrote:
> Although I am not sure about the accuracy of the information, this
> article claims that the only performance penalty of a volatile read on
> x86 is the loss of optimization possibilities:
> http://www.infoq.com/articles/memory_barriers_jvm_concurrency

does x86 include 64 bit systems? That is not fully clear in this
article. It is quite common these days (even more in the future) to work
on 64bit systems. and the loss of optimization possibilities can make
Hotspot mostly useless.Not to forget bytecode instructed systems that
realize systems on multiple machines.

> With the Atomic* classes, the method lazySet (since 1.6) should be
> equal to setting a final field but without the only-once restriction.
> I interpret this to mean that writes in this thread that occur prior
> to the lazySet call, will happen-before other threads see the value
> from lazySet, whenever that is. I think the difference is that there
> is no happens-before relationship between the lazySet and subsequent
> reads in other threads.
>
> This isn't the volatile-write/non-volatile-read scenario, but to me
> the effect looks similar: the delayed visibility could just as well
> have been caused by an optimization of a non-volatile read. And
> although it seems consensus have yet to be reached, functionality for
> non-volatile appears to probably be coming to the Atomic*s in 1.7, in
> some form.
>
> But lazySet isn't terribly well documented and my understanding may be
> flawed, so I recommend getting a second opinion from the
> concurrency-interest list.

the basic question is if I can do safe publication with latySet or not.

bye blackdrag

Rémi Forax

unread,
Apr 25, 2010, 9:04:12 AM4/25/10
to jvm-la...@googlegroups.com
Hi Jochen,
Sorry to not response earlier but I was on vacation
with only a really slow internet connection :(
and a prototype of new (JVM) language to finish :)

In my opinion, the only way to solve your problem
and the problem of all dynamic languages that have
mutable metaclass is to rely on invalidation.

This mechanism is not already implemented
in the reference implementation (JDK7)
but allows to optimistically avoid to do a
metaclass modification check at every call sites.
Instead of that the JSR292 API
provide a way to bulk invalidate some callsites (or all)
http://download.java.net/jdk7/docs/api/java/dyn/Linkage.html#invalidateAll%28%29
and force each invalidated callsite to re-query the bootstrap method
at the next invokedynamic.

For more details see
http://wiki.jvmlangsummit.com/DinnerAtPiatti
(search for section invalidation)

Rémi

Jochen Theodorou

unread,
Apr 25, 2010, 12:01:06 PM4/25/10
to jvm-la...@googlegroups.com
Rémi Forax wrote:
[...]
> For more details see
> http://wiki.jvmlangsummit.com/DinnerAtPiatti
> (search for section invalidation)

true I fully forgot about invalidation... so how would I do it? most
probably a boolean in the call site that is not volatile and gets
flagged from outside if a meta class change happened. On each usage of
the call site this flag would be checked to see if the call site is
still valid.

The fear I have here is that we need to store the call sites in a way
that I can access. With invokedynamic this is one thing, without, it is
a entirely different matter. Without invokedynamic I have to maintain a
list of call sites with each entry being removable from memory once
needed. Currently this kind of structure is in the class itself. If the
class collected, the call site is along with it. If I need global call
site entries I have two points referencing call sites. Well, I can work
using phantom references I guess.

I should try it out at least.

bye blackdrag

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Rémi Forax

unread,
Apr 25, 2010, 1:18:02 PM4/25/10
to jvm-la...@googlegroups.com
Le 25/04/2010 18:01, Jochen Theodorou a écrit :
> Rémi Forax wrote:
> [...]
>> For more details see
>> http://wiki.jvmlangsummit.com/DinnerAtPiatti
>> (search for section invalidation)
>
> true I fully forgot about invalidation... so how would I do it? most
> probably a boolean in the call site that is not volatile and gets
> flagged from outside if a meta class change happened. On each usage of
> the call site this flag would be checked to see if the call site is
> still valid.

Invalidation trashes the call site, you have to rebuilt it.

>
> The fear I have here is that we need to store the call sites in a way
> that I can access. With invokedynamic this is one thing, without, it
> is a entirely different matter. Without invokedynamic I have to
> maintain a list of call sites with each entry being removable from
> memory once needed. Currently this kind of structure is in the class
> itself. If the class collected, the call site is along with it. If I
> need global call site entries I have two points referencing call
> sites. Well, I can work using phantom references I guess.
>

I don't think it's possible to do invalidation without the proper
support by the VM.
When an invalidation is requested, all threads run until each one of
them reach a safepoint,
when all threads wait, callsites in the bytecode can be reset.
Because Linkage.invalidation() is always called under a lock,
there is no publication problem if the bootstrap method lookups
for metaclass info under the same lock.

Example:
- a metaclass change:
synchronized(metaclass_lock) {
Linkage.invalidateAll();
metaclass.change_method(...);
}

- a bootstrap method:
CallSite boostrap(Class<?> declaringClass, String name, MethodType type) {
CallSite callsite = ...
MethodHandle target;
synchronized(metaclass_lock) {
target = metaclass.lookup_member(...);
}
callsite.setTarget(target);
return callsite;
}

because you only do synchronization in bootstrap method and
this method is not called often (once and once by invalidation)
you should not have performance problem.

I don't see how to emulate this.
If someone find I will be happy to implement the solution in the backport :)

> I should try it out at least.
>
> bye blackdrag
>

Rémi

Martin C. Martin

unread,
Apr 25, 2010, 1:35:28 PM4/25/10
to jvm-la...@googlegroups.com


On 4/25/2010 1:18 PM, Rémi Forax wrote:
>
> I don't think it's possible to do invalidation without the proper
> support by the VM.

As I understand it, even with callsite caching the callsite has to check
that the class of the receiver & arguments are the same as last time,
before it can use the method from last time. So at each callsite, we
need an object that records these things. When the receiver's metaclass
changes, couldn't it go through all such objects and set the receiver to
null or a sentinel value? They might have to be AtomicReferences in
that case.

Alternately, the object could contain a boolean "valid" field, and when
the metaclass is changed, its sets it to false in all relevant objects.

I'm assume that it's ok for the change to take effect at different times
in different threads.

Best,
Martin

Jochen Theodorou

unread,
Apr 25, 2010, 6:05:41 PM4/25/10
to jvm-la...@googlegroups.com
Rémi Forax wrote:
> Le 25/04/2010 18:01, Jochen Theodorou a écrit :
>> Rémi Forax wrote:
>> [...]
>>> For more details see
>>> http://wiki.jvmlangsummit.com/DinnerAtPiatti
>>> (search for section invalidation)
>>
>> true I fully forgot about invalidation... so how would I do it? most
>> probably a boolean in the call site that is not volatile and gets
>> flagged from outside if a meta class change happened. On each usage of
>> the call site this flag would be checked to see if the call site is
>> still valid.
>
> Invalidation trashes the call site, you have to rebuilt it.

sure, but how is the trashing done? I know in case of invokedynamic this
is done through the JVM, but I need something for Java5 and Java6 as well.

[...]
> I don't think it's possible to do invalidation without the proper
> support by the VM.
> When an invalidation is requested, all threads run until each one of
> them reach a safepoint,
> when all threads wait, callsites in the bytecode can be reset.

this sounds quite inefficient

> Because Linkage.invalidation() is always called under a lock,
> there is no publication problem if the bootstrap method lookups
> for metaclass info under the same lock.
>
> Example:
> - a metaclass change:
> synchronized(metaclass_lock) {
> Linkage.invalidateAll();
> metaclass.change_method(...);
> }
>
> - a bootstrap method:
> CallSite boostrap(Class<?> declaringClass, String name, MethodType type) {
> CallSite callsite = ...
> MethodHandle target;
> synchronized(metaclass_lock) {
> target = metaclass.lookup_member(...);
> }
> callsite.setTarget(target);
> return callsite;
> }

sure, that part is not the problem

[...]
> I don't see how to emulate this.
> If someone find I will be happy to implement the solution in the
> backport :)

This depends on what exactly you need. For Groovy I was thinking about
something like this:

- guard in a invocation
if (invalid) {
replace this callsite
} else {
target.invoke
}

- after meta class change
for (callsite : relevantCallsites) { callsite.invalid = true }

bootstrap would still be something like the above. Since "invalid" is a
boolean I don't have to fear partially initialized objects and since I
don't need instant publication of the new value, I can let the change
ride piggy back on another memory barrier. That means the call site will
be recognized as invalid eventually at a later time.

But in this pattern I have the next problem. If the call site is not
immutable, then I have again to use synchronization or something similar
and the total win is again zero.

It is like a cat hunting its tail.

So no solution yet. That is why I was looking for something like fences.
The above can only be a solution for invokedynamic, but not for the
backport unless you buy the horrible inefficient way. Well not that
inefficient, but it is not good for performance at all.

bye blackdrag

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Jochen Theodorou

unread,
Apr 25, 2010, 6:07:49 PM4/25/10
to jvm-la...@googlegroups.com
Martin C. Martin wrote:
>
>
> On 4/25/2010 1:18 PM, Rémi Forax wrote:
>>
>> I don't think it's possible to do invalidation without the proper
>> support by the VM.
>
> As I understand it, even with callsite caching the callsite has to check
> that the class of the receiver & arguments are the same as last time,
> before it can use the method from last time. So at each callsite, we
> need an object that records these things. When the receiver's metaclass
> changes, couldn't it go through all such objects and set the receiver to
> null or a sentinel value? They might have to be AtomicReferences in
> that case.

then you need mutable call site objects. I don't think that will get you
somewhere in the backport.

> Alternately, the object could contain a boolean "valid" field, and when
> the metaclass is changed, its sets it to false in all relevant objects.

yes, I was thinking the same, but for the reason above it does not work.
Not if you want to avoid memory barriers for each invocation

bye blackdrag


--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Rémi Forax

unread,
Apr 25, 2010, 7:16:34 PM4/25/10
to jvm-la...@googlegroups.com
Le 26/04/2010 00:05, Jochen Theodorou a écrit :
> Rémi Forax wrote:
>> Le 25/04/2010 18:01, Jochen Theodorou a écrit :
>>> Rémi Forax wrote:
>>> [...]
>>>> For more details see
>>>> http://wiki.jvmlangsummit.com/DinnerAtPiatti
>>>> (search for section invalidation)
>>>
>>> true I fully forgot about invalidation... so how would I do it? most
>>> probably a boolean in the call site that is not volatile and gets
>>> flagged from outside if a meta class change happened. On each usage
>>> of the call site this flag would be checked to see if the call site
>>> is still valid.
>>
>> Invalidation trashes the call site, you have to rebuilt it.
>
> sure, but how is the trashing done? I know in case of invokedynamic
> this is done through the JVM, but I need something for Java5 and Java6
> as well.

for java5 and java6, rely on polling at each calls and
advertise the users that if they use jdk7 they will see a performance
boost :)

>
> [...]
>> I don't think it's possible to do invalidation without the proper
>> support by the VM.
>> When an invalidation is requested, all threads run until each one of
>> them reach a safepoint,
>> when all threads wait, callsites in the bytecode can be reset.
>
> this sounds quite inefficient

It's inefficient but less than polling at each invokedynamic.
This is how devirtualisation or GC threads cooperation currently works
at least in Hotspot and JRockit.
The problem of publication is that you can see partially initialized value.
By example:
metaclass.changeMethod();
valid = true;

can be re-organized:
valid = true;
metaclass.changeMethod();

or worst, if changeMethod contains two instructions:
instruction1;
valid = true;
instruction2;

Also maintaining a list of all callsites to invalidate can perhaps
not worth it.

>
> But in this pattern I have the next problem. If the call site is not
> immutable, then I have again to use synchronization or something
> similar and the total win is again zero.

call site is not immutable. You can change the target method handle and
by design the target method handle is not stored in a volatile field.

>
> It is like a cat hunting its tail.
>
> So no solution yet. That is why I was looking for something like
> fences. The above can only be a solution for invokedynamic, but not
> for the backport unless you buy the horrible inefficient way. Well not
> that inefficient, but it is not good for performance at all.

Another solution is to not do the polling at each call but just increment
a counter and do the polling let say every 1000 calls.
If I finally decide to implement invalidation, I think I will implement it
like that.

>
> bye blackdrag
>

cheers,
Rémi

Rémi Forax

unread,
Apr 26, 2010, 5:24:13 AM4/26/10
to jvm-la...@googlegroups.com
Le 25/04/2010 19:35, Martin C. Martin a écrit :
>
>
> On 4/25/2010 1:18 PM, Rémi Forax wrote:
>>
>> I don't think it's possible to do invalidation without the proper
>> support by the VM.
>
> As I understand it, even with callsite caching the callsite has to
> check that the class of the receiver & arguments are the same as last
> time, before it can use the method from last time. So at each
> callsite, we need an object that records these things. When the
> receiver's metaclass changes, couldn't it go through all such objects
> and set the receiver to null or a sentinel value? They might have to
> be AtomicReferences in that case.
>
> Alternately, the object could contain a boolean "valid" field, and
> when the metaclass is changed, its sets it to false in all relevant
> objects.
>
> I'm assume that it's ok for the change to take effect at different
> times in different threads.
>
> Best,
> Martin

Hi Martin,
Yes, checking a volatile field at each call site works but seems to
perform badly.

Jochen Theodorou

unread,
Apr 26, 2010, 5:56:34 AM4/26/10
to jvm-la...@googlegroups.com
Rémi Forax wrote:
[...]
> call site is not immutable. You can change the target method handle and
> by design the target method handle is not stored in a volatile field.

Well I am speaking about Groovy on pre java7 here. I think for the
backport it might not be a good idea too. Of course I know this
conflicts with the nature of being a backport. But if you have to go
throug a memory barrier for each time you access the target of a call
site, then this surely is no pro for the performance. For me the
conclusion is that the code for pre java7 and java7 will differ in more
ways than what a backport offers. This means maybe the invalidation, but
surely means the MethodHandle target.

[...]
> Another solution is to not do the polling at each call but just increment
> a counter and do the polling let say every 1000 calls.
> If I finally decide to implement invalidation, I think I will implement it
> like that.

Even though the counter would be accessed concurrently a simple non
volatile int field without any memory barriers when you access it would
be enough. It does not matter if it is not exactly every 1000 calls I
guess. If 10 threads are running, it can very well happen, that much
more than 10000 calls are made before new polling occurs. And it can
also happen, that all threads in a short period reach the 1000 count and
you get a lot of polling.

I am a bit worried about hotspot here. And if you do it using memory
barriers it is not better than polling.

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Rich Hickey

unread,
Apr 26, 2010, 7:36:51 AM4/26/10
to jvm-la...@googlegroups.com
Thanks for the notes Rémi. That is the sort of thing I was talking
about.

If there is an internal facility for interacting with safe-points or
global critical sections, I think it should be made available to other
languages, independent of call sites and method handles etc.

We all can do call sites, caches etc ourselves *today* with classes,
methods and code gen (as demonstrated by the backport and our various
language efforts). Yes, JSR-292 *might* simplify that, to the extent
our use cases align with its design. But I think it would be better to
simply make classes, methods and code gen cheaper and faster, with
universal benefit, than building method handles, invokedynamic etc and
putting a new burden on VMs to be able to implement and optimize them,
when they can already optimize method calls very well.

The thing we cannot do is the safe-point stuff, and, if we could, we
might be getting the same perf as Java, today, for our dynamic calls.

Rich

Charles Oliver Nutter

unread,
Apr 26, 2010, 11:50:54 AM4/26/10
to jvm-la...@googlegroups.com
On Sun, Apr 25, 2010 at 5:05 PM, Jochen Theodorou <blac...@gmx.org> wrote:
> This depends on what exactly you need. For Groovy I was thinking about
> something like this:
>
> - guard in a invocation
> if (invalid) {
>   replace this callsite
> } else {
>   target.invoke
> }
>
> - after meta class change
> for (callsite : relevantCallsites) { callsite.invalid = true }
>
> bootstrap would still be something like the above. Since "invalid" is a
> boolean I don't have to fear partially initialized objects and since I don't
> need instant publication of the new value, I can let the change ride piggy
> back on another memory barrier. That means the call site will be recognized
> as invalid eventually at a later time.
>
> But in this pattern I have the next problem. If the call site is not
> immutable, then I have again to use synchronization or something similar and
> the total win is again zero.

JRuby used to actively invalidate call sites, but the synchronization
issues were a major headache. I didn't spend a lot of time researching
it, but it did not appear to be possible to invalidate a mutable call
site without introducing some kind of lock, which is obviously not an
option for a fast call site.

The JDK7 way could work, however, if the invalidation resulted in a
call site being totally inaccessible. In-flight code would use it as
though it had not been invalidated, and new calls would re-bootstrap.
But it still comes down to a primary question:

* How quickly do you want all threads to see that a callsite is invalid?

In JRuby right now it's expected that you are not going to be doing
method table changes (class mutation) across running threads. So the
poll from the callsite to the class is non-volatile. It's possible for
a thread to not see that a class has changed right away (or following
the memory model to the letter...it might never see it) if the change
happens in another thread and nothing causes the appropriate flags to
propagate to the current thread. It is, at some level, "piggy-backing"
on other memory barriers.

If you need the change to be visible immediately, I don't see any
other way with Java5/6 than using volatiles.

- Charlie

Jochen Theodorou

unread,
Apr 26, 2010, 4:34:55 PM4/26/10
to jvm-la...@googlegroups.com
Charles Oliver Nutter wrote:
[...]
> * How quickly do you want all threads to see that a callsite is invalid?
>
> In JRuby right now it's expected that you are not going to be doing
> method table changes (class mutation) across running threads. So the
> poll from the callsite to the class is non-volatile. It's possible for
> a thread to not see that a class has changed right away (or following
> the memory model to the letter...it might never see it) if the change
> happens in another thread and nothing causes the appropriate flags to
> propagate to the current thread. It is, at some level, "piggy-backing"
> on other memory barriers.
>
> If you need the change to be visible immediately, I don't see any
> other way with Java5/6 than using volatiles.

I was thinking about changing semantics to say if one thread does a
modification another thread will see it eventually. If the user wants to
ensure it is visible he has to use synchronization by himself.

My problem is more, that if I want to use non-volatile reads, then all
the reads do have to access immutable data. And this is, what is causing
me headaches.

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Charles Oliver Nutter

unread,
Apr 29, 2010, 8:46:40 PM4/29/10
to jvm-la...@googlegroups.com
On Mon, Apr 26, 2010 at 3:34 PM, Jochen Theodorou <blac...@gmx.org> wrote:
> I was thinking about changing semantics to say if one thread does a
> modification another thread will see it eventually. If the user wants to
> ensure it is visible he has to use synchronization by himself.
>
> My problem is more, that if I want to use non-volatile reads, then all the
> reads do have to access immutable data. And this is, what is causing me
> headaches.

In JRuby the cached data is immutable; it's a final tuple of a token
(for validation) and the target method. Our metaclasses provide and
store this tuple when we query them for methods. So the other pieces
that do get mutated:

* The tuple reference in the call site, which is non-volatile.
However, it's ok if it varies from thread to thread for some amount of
time since we re-validated it on every call.
* The token in each metaclass, also non-volatile. But we have accepted
that making class updates across threads will lead to unpredictable
results, so it's discouraged and we make no guarantees.

- Charlie

John Rose

unread,
Oct 1, 2010, 3:44:47 AM10/1/10
to jvm-la...@googlegroups.com
I had a chance to sit down with Rich at JavaOne to discuss this. (Thanks Rich!)

The key problem is caching the result of a partial evaluation. (For example, an expected method dispatch at a call site, or a metaclass which summarizes the shape of a type.) The cached value can be made "pure" (read-only) but it is subject (at least potentially) to very slow change, once per "epoch". Let's call such a slowly varying value a "stable value".

(Compare "stationary values" Ian Rogers's paper for PPPJ'08. What dynamic language folks need is an epochal variation of this concept, a value which is stationary most of the type and changes once in a blue moon. Hence, "stable value".)

The typical dynamic language technique is to use a volatile variable to hold the cached value. The volatility ensures that once-per-epoch changes get immediately picked up by threads. The problem is that the volatile reference defeats caching and inlining of all sorts.

The JVM has better tricks. It can cache and inline a vtable lookup if it can prove (from class hierarchy analysis) that a method is not overridden. The invokedynamic instruction has similar capabilities. If the epoch changes (because a new overriding method is loaded), code needs to be edited or discarded, and threads need to be interrupted. This happens at a safepoint.

Here's a sketch for a user-level hook to stable values:

public class StableValue<T> {
/** Get the current value. */
public T getStableValue() {...}
/** Set the current value. This may be costly. */
public void getStableValue(T x) {...}
/** Commit the current value. Future calls to set() will throw. */
public void getStableValue() {...}
}

I'm going to abbreviate "getStableValue" as "get", etc., from here on. It is tempting to make an analogy with java.lang.ref.Reference, but I don't know if that makes enough sense. By using the full term "getStableValue" we can contemplate mixing stable values into other types:
public class CallSite extends StableValue<MethodHandle> { ... }

Anyway, the idea is that the JVM is encouraged (by the user) to perform optimizations on the value returned by the get() call. If it does so, it must somehow watch for future calls to set(). The freeze() call (from the user) informs the JVM that it can stop watching. This should probably be a system-defined class, because it may have private fields which participate in the JVM's task of tracking dependencies.

Of course, static variables in Java are somewhat amenable to this treatment "automagically". But the above abstraction guides the JVM much more clearly. I don't think we can retrofit this behavior onto normal Java variables. (We can with "static final" of course, but those guys can't evolve over the epochs.) It's better to be explicit here, I think.

StableValue should be subclassable, in the same style as java.lang.ref classes. That will remove gratuitous indirections. For example, a StableHashTable could be built whose entries subclass StableValue, much like a WeakHashTable.

It should be possible to build lazy values on top of StableValues, by overriding get() to check if the current value is null. (Another reason to subclass.)

The StableValue.set call should perform the memory effects of a release (cf. monitorexit) before the value is published.

The StableValue.get call should perform the memory effects of an acquire (cf. monitorenter) before the value is fetched, the *first* time any given thread calls get() on a given StableValue. Subsequent get() calls are "for free". This is why a safepoint mechanism is needed, to force querying threads to re-aquire relevant memory blocks.

A StableValue can be "entangled" with optimized code if its value is folded into the code. If this happens, the JVM needs to keep track of the StableValue so that it can adjust or discard the code the next time the value changes. The StableValue class probably needs invisible synthetic fields to help it link into the JVM's data structures for tracking code dependencies.

The caching behavior of java.dyn.CallSite with respect to setTarget could then be defined in terms of StableValue. It will be "as if" the CallSite held its current target in a StableValue.

Class metadata (of type MyMD) can be stored in a ClassValue<StableValue<MyMD>>. For information on ClassValue, see:
http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/java/dyn/ClassValue.html

-- John

P.S. None of the above helps with the special case of getting invalidation when types gain new subtypes. There's another hook needed there, I think. For example, ClassValues could be optionally made sensitive to class hierarchy changes, so that they get recomputed if new subtypes are loaded. Not sure if this is needed, though.

John Rose

unread,
Oct 1, 2010, 3:49:32 AM10/1/10
to jvm-la...@googlegroups.com
Fixing a couple of typos:

public class StableValue<T> {
 /** Get the current value. */
 public T getStableValue() {...}
 /** Set the current value.  This may be costly. */
 public void setStableValue(T x) {...}

 /** Commit the current value.  Future calls to set() will throw. */
 public void freezeStableValue() {...}
}

Rémi Forax

unread,
Oct 1, 2010, 3:55:01 AM10/1/10
to jvm-la...@googlegroups.com
I like the concept but I want try to propose a better API this aftermoon.

Rémi

John Rose

unread,
Oct 1, 2010, 6:02:45 PM10/1/10
to jvm-la...@googlegroups.com
Thanks, David; that was a good pointer. Turning a volatile field into a normal one is good when it's possible.

For StableValue, I think there needs to be (a) a better guarantee of responsiveness to changes than normal fields provides, and (b) a clear path to optimizing the stable value as if it were "mostly constant" in the same sense that the bodies of not-yet-overridden virtual methods are "mostly constant" today.

Does something like StableValue belong in the Fences API, I wonder?

-- John

Charles Oliver Nutter

unread,
Oct 1, 2010, 11:13:01 PM10/1/10
to jvm-la...@googlegroups.com
I will say that this helps the volatile reference problem we all face,
but I know Jochen and I are still looking for a way to completely get
rid of our soft/weak proxy/metaclass wrappers :( I'm this || close to
hacking Maxine to have an additional mutable reference on all objects
just so I can see how it would play out...

On Fri, Oct 1, 2010 at 3:44 AM, John Rose <john....@oracle.com> wrote:
> I had a chance to sit down with Rich at JavaOne to discuss this.  (Thanks Rich!)
>
> The key problem is caching the result of a partial evaluation.  (For example, an expected method dispatch at a call site, or a metaclass which summarizes the shape of a type.)  The cached value can be made "pure" (read-only) but it is subject (at least potentially) to very slow change, once per "epoch".  Let's call such a slowly varying value a "stable value".
>
> (Compare "stationary values" Ian Rogers's paper for PPPJ'08.  What dynamic language folks need is an epochal variation of this concept, a value which is stationary most of the type and changes once in a blue moon.  Hence, "stable value".)
>
> The typical dynamic language technique is to use a volatile variable to hold the cached value.  The volatility ensures that once-per-epoch changes get immediately picked up by threads.  The problem is that the volatile reference defeats caching and inlining of all sorts.
>
> The JVM has better tricks.  It can cache and inline a vtable lookup if it can prove (from class hierarchy analysis) that a method is not overridden.  The invokedynamic instruction has similar capabilities.  If the epoch changes (because a new overriding method is loaded), code needs to be edited or discarded, and threads need to be interrupted.  This happens at a safepoint.
>
> Here's a sketch for a user-level hook to stable values:
>
> public class StableValue<T> {
>  /** Get the current value. */
>  public T getStableValue() {...}
>  /** Set the current value.  This may be costly. */
>  public void getStableValue(T x) {...}
>  /** Commit the current value.  Future calls to set() will throw. */
>  public void getStableValue() {...}
> }
>
> I'm going to abbreviate "getStableValue" as "get", etc., from here on.  It is tempting to make an analogy with java.lang.ref.Reference, but I don't know if that makes enough sense.  By using the full term "getStableValue" we can contemplate mixing stable values into other types:
>  public class CallSite extends StableValue<MethodHandle> { ... }
>
> Anyway, the idea is that the JVM is encouraged (by the user) to perform optimizations on the value returned by the get() call.  If it does so, it must somehow watch for future calls to set().  The freeze() call (from the user) informs the JVM that it can stop watching.  This should probably be a system-defined class, because it may have private fields which participate in the JVM's task of tracking dependencies.

It seems like it's in the family of Atomic* to me, and as you
mentioned in a followup email we need to make the set+freeze
operation (at minimum) be atomic. So why not just borrow the
AtomicReference API and have AtomicStableReference or
StableAtomicReference (and perhaps AtomicStable*) that also implement
getAndSet, incrementAndSet, incrementAndFreeze, and so on?

I also liked Doug Lea's "fences" API, though I guess that's off the
table for Java 7 now. It was certainly more flexible, since it allowed
us more freedom about which fields to use (and therefore we would not
have to change current impls...we just fence them), but it is perhaps
more invasive at a VM level.

> Of course, static variables in Java are somewhat amenable to this treatment "automagically".  But the above abstraction guides the JVM much more clearly.  I don't think we can retrofit this behavior onto normal Java variables.  (We can with "static final" of course, but those guys can't evolve over the epochs.)  It's better to be explicit here, I think.
>
> StableValue should be subclassable, in the same style as java.lang.ref classes.  That will remove gratuitous indirections.  For example, a StableHashTable could be built whose entries subclass StableValue, much like a WeakHashTable.
>
> It should be possible to build lazy values on top of StableValues, by overriding get() to check if the current value is null.  (Another reason to subclass.)

compareAndSet?

> The StableValue.set call should perform the memory effects of a release (cf. monitorexit) before the value is published.
>
> The StableValue.get call should perform the memory effects of an acquire (cf. monitorenter) before the value is fetched, the *first* time any given thread calls get() on a given StableValue.  Subsequent get() calls are "for free".  This is why a safepoint mechanism is needed, to force querying threads to re-aquire relevant memory blocks.
>
> A StableValue can be "entangled" with optimized code if its value is folded into the code.  If this happens, the JVM needs to keep track of the StableValue so that it can adjust or discard the code the next time the value changes.  The StableValue class probably needs invisible synthetic fields to help it link into the JVM's data structures for tracking code dependencies.

One disadvantage this has from "fences" is that we will have to always
walk a reference to get to the StableValue object. Perhaps we can roll
in the optimization Tom R. came up with to fold away repeated accesses
of the same final field reference? Or will Hotspot see through the
field reference + StableValue.get and fold the whole thing together?

> The caching behavior of java.dyn.CallSite with respect to setTarget could then be defined in terms of StableValue.  It will be "as if" the CallSite held its current target in a StableValue.
>
> Class metadata (of type MyMD) can be stored in a ClassValue<StableValue<MyMD>>.  For information on ClassValue, see:
>  http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/java/dyn/ClassValue.html

Ah-ha, this is what I missed before. This definitely does help with
the Class/MetaClass relationship most of us will have to deal with, so
with this and StableValue we're really getting some powerful tools
here.

> -- John
>
> P.S. None of the above helps with the special case of getting invalidation when types gain new subtypes.  There's another hook needed there, I think.  For example, ClassValues could be optionally made sensitive to class hierarchy changes, so that they get recomputed if new subtypes are loaded.  Not sure if this is needed, though.

Now I get this statement as well. If a new subtype comes in you may or
may not want the parent's ClassValue to apply down-hierarchy. If you
don't want it to apply, you need a way to invalidate and re-prime
ClassValues for the super and subtype. Agreed...there's a mechanism of
some sort needed here, OR ClassValues could be limited to exactly one
type from one classloader, and we'll handle the overhead of binding
new types that enter the system.

- Charlie

John Rose

unread,
Oct 2, 2010, 3:33:54 PM10/2/10
to jvm-la...@googlegroups.com
FTR... Comments by Doug. -- John

Begin forwarded message:

From: Doug Lea <d...@cs.oswego.edu>
Date: October 2, 2010 7:10:47 AM PDT
To: John Rose <john....@oracle.com>
Subject: Re: is there a fence for this?

On 10/01/10 17:56, John Rose wrote:
> I'm beginning to think that both mutable invokedynamic call sites and cached
> meta-classes are instances of the same pattern, a "stable value", which the
> JVM does not support well. But JSR 292 can supply a new hook, if JSR 166
> does not already.

We (JSR166) don't have anything directly applicable.
Here are a few thoughts. Feel free to relay any or all of this
to jvm-languages list.

Memory-model-wise, I suppose you'd classify this as a
weird form of "Release consistency":
Readers running between releases are allowed to act as if
values do not change. Here, the release points correspond
to epochs, and we assume that there is some sort of sync
so that writers don't interfere with each other upon
setting up new epochs. And possibly (but, I suspect, not
necessarily) a full all-thread r/w barrier upon epoch advance.
But the main twist on release-consistency is that reader-sync
may (or may not) entail new code generation.

In principle, this could be addressed with scoped fences.
(as in the ones at
http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/atomic/Fences.html)

Note: I suspect that the Fences API won't make Java7.
It wasn't in our recent jsr166/openjdk integration. Due to
the combination of mismatches with existing JLS ch 17
specs (which would need even more revisions/fixes than they
already do) and the continual moral hazard complaints --
i.e., arguments that the world is better off if the only
people using them are those who can so do via Unsafe.

Scoped fences map pretty well to release consistency (as
well as to most others), and so ought to apply here.
(As a thought experiment, consider an implementation of
even ordinary fences that might involve code regeneration
on some forms of updates.) Still, the actual APIs would
probably need some adaptation if only because there is
no Java notion of the associated scope/epoch. And pragmatically,
you'd want these fences to be treated very specially anyway.
I think you'd only need one or two forms (not the usual three),
mainly (using made-up Epoch type):
orderReadsByEpoch(Epoch e);
and maybe (but probably left for JVM-internal use only)
orderWritesByEpoch(Epoch e)
with specs similar to plain orderReads/orderWrites.

Interestingly enough, the similarities appear to overlap those
wrt "freeze" actions described (with bugs) in JLS sec 17.5
(http://java.sun.com/docs/books/jls/third_edition/html/memory.html)
that do not have an independent implementation in any JVM
(but could be). "Freezes" are an attempt to provide an account
of some final-field oddities. Beware that there are
unexpected but allowed behaviors with not-really-final
final fields, that may also be shared by epoch-based control.

Given the above, "stable" would not be my favorite
term for this mode. Also, my experience with trying
to use per-variable type-qualifier-like or magic-type/class
notions in Java (as in your "StableValue")
to map to fences is not very positive. Even "volatile" and
"final" are problematic. On the other hand, it is fine,
and often preferable to only get fences as parts of
special-mode accesses. You implicitly do this in methods
like "getStableValue", but doing it only per-access
leads to the usual problems we face in not being
able to efficiently support l-value operations.
(Aside: A different solution to this will probably eventually
be part of X10 spec: using flavored blocks for fence control.
As in
finish { p = x; } f = p.f // volatile-style field read
I wish there were some syntax holes for this sort of
thing in Java....)

Which leads me at the moment to suggest just adding one or two
fence-like methods for epochs.

-Doug

John Rose

unread,
Oct 2, 2010, 10:38:47 PM10/2/10
to jvm-la...@googlegroups.com
On Oct 1, 2010, at 8:13 PM, Charles Oliver Nutter wrote:

> I will say that this helps the volatile reference problem we all face,
> but I know Jochen and I are still looking for a way to completely get
> rid of our soft/weak proxy/metaclass wrappers :( I'm this || close to
> hacking Maxine to have an additional mutable reference on all objects
> just so I can see how it would play out...

Thomas Wuerthinger's enhanced hotswap work gets moderately close to this, and he and I talked at JavaOne about what it would take to go the rest of the way. The main problem is that the HotSpot JVM thinks it knows some hard-coded offsets into well-known classes like String. (It's an old problem http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4953140 .) The 292 work makes it worse, since there is more coupling between the JVM and the java.dyn classes.

(Long ago, I was on the technical due-diligence team to see if Sun wanted to acquire this strange HotSpot technology. The code looked well-factored, so we tried adding a field to all object headers, a field containing the backtrace as of the point the object was allocated. It took an afternoon, and worked the first time.)

> It seems like it's in the family of Atomic* to me, and as you
> mentioned in a followup email we need to make the set+freeze
> operation (at minimum) be atomic. So why not just borrow the
> AtomicReference API and have AtomicStableReference or
> StableAtomicReference (and perhaps AtomicStable*) that also implement
> getAndSet, incrementAndSet, incrementAndFreeze, and so on?

That's a good idea. It's not likely to happen in a user visible way in JDK 7, though.

> I also liked Doug Lea's "fences" API, though I guess that's off the
> table for Java 7 now. It was certainly more flexible, since it allowed
> us more freedom about which fields to use (and therefore we would not
> have to change current impls...we just fence them), but it is perhaps
> more invasive at a VM level.

Not very much more invasive. The JVM immediately cooks those things to fences, anyway:
http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/java/util/concurrent/atomic/AtomicInteger.java
http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/tip/src/share/vm/opto/library_call.cpp

(See lazySet, and compare inline_unsafe_ordered_store with inline_unsafe_access.)

>> A StableValue can be "entangled" with optimized code if its value is folded into the code. If this happens, the JVM needs to keep track of the StableValue so that it can adjust or discard the code the next time the value changes. The StableValue class probably needs invisible synthetic fields to help it link into the JVM's data structures for tracking code dependencies.
>
> One disadvantage this has from "fences" is that we will have to always
> walk a reference to get to the StableValue object. Perhaps we can roll
> in the optimization Tom R. came up with to fold away repeated accesses
> of the same final field reference?

Yes, that's already partially present in HotSpot, but just for method handles. Tom's version is much more general.

(The key problem here is the finals are not really final, in the JVM's view, because constructors and reflection are allowed to change them.)

> Or will Hotspot see through the
> field reference + StableValue.get and fold the whole thing together?

It depends on the root reference to the StableValue, of course. In the case of an invokedynamic call site, there is a stable reference to the CallSite, so if it incorporates a StableValue (or equivalent behavior), the extra indirection is routinely folded up inside the compilation of the call site.

In the case of metaclass extraction from a non-constant x.getClass call, it's a chain of however many indirections there are in the data structure.

>> The caching behavior of java.dyn.CallSite with respect to setTarget could then be defined in terms of StableValue. It will be "as if" the CallSite held its current target in a StableValue.
>>
>> Class metadata (of type MyMD) can be stored in a ClassValue<StableValue<MyMD>>. For information on ClassValue, see:
>> http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/java/dyn/ClassValue.html
>
> Ah-ha, this is what I missed before. This definitely does help with
> the Class/MetaClass relationship most of us will have to deal with, so
> with this and StableValue we're really getting some powerful tools
> here.

Yes. Let's get that one right.

>> P.S. None of the above helps with the special case of getting invalidation when types gain new subtypes. There's another hook needed there, I think. For example, ClassValues could be optionally made sensitive to class hierarchy changes, so that they get recomputed if new subtypes are loaded. Not sure if this is needed, though.
>
> Now I get this statement as well. If a new subtype comes in you may or
> may not want the parent's ClassValue to apply down-hierarchy. If you
> don't want it to apply, you need a way to invalidate and re-prime
> ClassValues for the super and subtype. Agreed...there's a mechanism of
> some sort needed here, OR ClassValues could be limited to exactly one
> type from one classloader, and we'll handle the overhead of binding
> new types that enter the system.

The only version I have now is ignorant of subtyping relations, so each class is treated as a unique thing. That's sufficient for now, and is probably all that can be done for JDK 7.

Thanks for the searching comments and questions.

-- John

Doug Lea

unread,
Dec 10, 2010, 8:08:28 AM12/10/10
to John Rose, Cliff Click, Brian Goetz, Charles Nutter, Jochen Theodorou, Rich Hickey, Rémi Forax, jsr-2...@jcp.org, Da Vinci Machine Project, jvm-la...@googlegroups.com
On 12/09/10 19:09, John Rose wrote:
> I started a thread on Google Groups to get more advice on safepoint-based
> invalidation, which the EG is naming MutableCallSite#sync.
>
> http://groups.google.com/group/jvm-languages/browse_thread/thread/9c9d3e84fc745676#
>

TL;DR version: The scheme you laid out here seems sound.

(To avoid relays/forks of discussions, I tried joining and CCing the group)

My only concern about this is whether it precludes
further useful support for the general problem
of "phased computation", which is in turn either a
variant of or a generalization of fork/join computation,
depending on how you look at it. Since JDK7 includes
integrated support for both ForkJoinTask and Phaser,
that will both probably be used in JDK8 lambdas etc,
it seems almost a sure thing that people will further
explore underlying JVM support. Here's a synopsis of
issues. (Please bear with the background before
returning to questions at hand.)

Phased and fork/join computations impose enough structure
on parallel computations to allow cheaper low-level
memory-level accesses. Well, at least when they are used in
the intended ways; among the main difference between JDK7
support and explicitly parallel languages like X10
(supporting "clocks" (a form of Phasers) and async/finish
(a form of fork/join)) is that Java compilers cannot
enforce correct usage (although tools and IDE plugins
could be improved/developed that could do so in most cases).

As warmup, for fork/join, the main economies are due to the
static DAG structure of computation. Among other impacts,
accessing a variable modified by a joined task requires no
memory-level sync beyond what was required anyway
to determine the join. (We currently have no way to
designate such variables or usages; better support
probably awaits this.)

For phased computations, the main economies apply
to "phased" variables -- those that may take a
new value on each phase but are constant within phases.
(We also don't have a way of designating these.)
Conceptually, a phased variable is a holder of
two values/variables, call them "current" and "next",
where reads are from "current" and writes are
to "next". In existing JMM-ese, "current" is basically a
non-volatile variable, and "next" is volatile. And
conceptually, upon each phase advance, each party
does "current = next". There are a few ways to
implement this differently though. One common option
is to use only a single location per phased variable,
and to arrange an action taken upon phase advance
to do in-place updates (see for example Phaser.onAdvance,
http://gee.cs.oswego.edu/dl/jsr166/dist/docs/java/util/concurrent/Phaser.html
which must by nature be performed in exclusion since
all other parties are blocked until advance. Another
option is to use a two-slot array per phased variable,
indexed by the lowest bit of phase number: reads are
from slot[phase & 1], writes are to slot[~phase & 1]
(assuming that phase numbers are explicit and sequential).
This avoids the need for copying. Also, across
all of these, it is possible and sometimes desirable for
parties to use their own writes early, before advance.

Sadly enough, I don't know of a recipe determining which
of these options is best in a given situation. (Although
I do know one of the ingredients: whether you require
atomicity of updates across all parties upon advance.)
Maybe some further study and exploration could arrive at a
good recipe.

In the mean time, note that only one of these options
(explicit current+next) even has a reasonable interpretation
in existing JVM-ese. The others are mismatched in that
some of the operations have JMM-volatile semantics and some
don't, so it doesn't make sense to declare the variable
as either volatile or non-volatile.

John's proposal for MutableCallSite falls into this category,
hence the controversy about how to express it.
It is a specialization of the onAdvance (aka sync()) approach
with in-place updates, and safepoint generations serving
as phases. This seems reasonable given the likely need for
atomicity during safepoints anyway.

While ideally it would be nice to integrate this with
other phased computation even now (I can imagine defining
adjunct classes for use with j.u.c.Phaser), I'm not too
tempted to do so right now, because it is not always the
best available option.

However, someday, it would be great to be able
to allow use the same mechanics in each. As parallelism
becomes more widespread, we must make it easiest and
most attractive for people to use simpler constrained
execution control patterns like phasers and FJ rather than the
ad-hoc unstructured use of threads/volatiles/locks. One
way to help reach this goal is to improve JVM-level support
so that the most structured parallel code is also the fastest
parallel code.

-Doug

John Rose

unread,
Dec 10, 2010, 9:04:11 PM12/10/10
to Doug Lea, Cliff Click, Brian Goetz, Charles Nutter, Jochen Theodorou, Rich Hickey, Rémi Forax, jsr-2...@jcp.org, Da Vinci Machine Project, jvm-la...@googlegroups.com
On Dec 10, 2010, at 5:08 AM, Doug Lea wrote:

> On 12/09/10 19:09, John Rose wrote:
>> I started a thread on Google Groups to get more advice on safepoint-based
>> invalidation, which the EG is naming MutableCallSite#sync.
>>
>> http://groups.google.com/group/jvm-languages/browse_thread/thread/9c9d3e84fc745676#
>>
>
> TL;DR version: The scheme you laid out here seems sound.

Thanks very much, Doug. I assume you are primarily referring to part (a) of my question:

>> In order to work, the specification has to (a) make logical sense in the terms of the JMM, (b) be reasonably implementable by JVMs, and (c) be useful to programmers.

Does anyone have a take on (c)? (See my comments below on using this API...)

> My only concern about this is whether it precludes
> further useful support for the general problem

> of "phased computation"...

I hope not, though JSRs 292 and 166 are an instance of:
http://en.wikipedia.org/wiki/Conway's_Law

The worst case is that we are seeing the good being the enemy of the better. The best case is we are doing initial exercises in JDK 7 which will get well-structured in JDK 8. I vote for best case.

The device of associating a "nonce-volatile" with a safepoint-phased variable update is generally applicable, not just to MutableCallSite. Perhaps once we have a suitable LValue abstraction (able to efficiently refer to x.f or a[i], which are your two implementations of phased variables), we can make a standard operation LValue.setOnceVolatile.

Doug, you've asked before for LValue. As we've discussed, in order to do LValue, the JVM needs very reliable scalarization. This means alleviating certain obstacles from the JOM (Java Object Model), notably pointer comparison and monitor-per-object. Tuples would help also. The MLVM project needs to work on this stuff; the key resource shortage is the most valuable resource, which is people (esp. HotSpot hackers) to think and work on it.

> For phased computations, the main economies apply
> to "phased" variables -- those that may take a
> new value on each phase but are constant within phases.

Thanks for this concept.

> Sadly enough, I don't know of a recipe determining which
> of these options is best in a given situation. (Although
> I do know one of the ingredients: whether you require
> atomicity of updates across all parties upon advance.)
> Maybe some further study and exploration could arrive at a
> good recipe.

In the JSR 292 case, we are supporting languages (like Java!) which can dynamically change their type schemas, but do so infrequently, perhaps at phase changes. The changes will in general require complex sets of call sites to be updated, implying a dependency mechanism (see the Switcher class for this). But the actual change-over can be slow, and may involve safepoints, as today with devirtualization that happens when new classes are loaded.

(See Universe::flush_dependents_on around line 1100 of:
http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/tip/src/share/vm/memory/universe.cpp )

The idea is that the type schema change would be prepared (as a "next global state") by a writer thread #1. At this point some call sites become "invalid in next state". Those call sites would be switched, either by a switcher (which would in general control many call sites at once) or by editing them individually, and then calling 'sync' on them.

The new state of all call sites must at this point be consistent with both the current and next global state of the type schema. Typically, a lock would be held on the schema by writer thread #1, and the updated call site paths (now switched in and sync-ed) would pile up on the schema lock. After the next global state is installed as current, the writer thread #1 lets go of the lock, and callers are free to shake out the new state of affairs.

The shake-out may include installing optimized call paths in various call sites. This means that a mutable call site might undergo two mutations, the first to get into a state consistent with both old and new schemas (perhaps seizing a lock, as above) and the second to optimize calling within the new state. Both transitions generally need to grab a lock so they can have a consistent view of the global schema state.

As an extra complication, the editing of call sites is probably performed in a concurrent manner, and lazily. There might be writer thread #2, etc., performing updates to mutable call sites. It it not clear to me whether the second write also needs a sync, but it looks like that may be the case to get performance. Perhaps the right way to go is avoid the second write, and have the first write install the optimized path immediately, and queuing a 'sync' request to be executed after a short delay. One key concern is to avoid "safepoint storms", by batching 'sync' requests.

> In the mean time, note that only one of these options
> (explicit current+next) even has a reasonable interpretation
> in existing JVM-ese. The others are mismatched in that
> some of the operations have JMM-volatile semantics and some
> don't, so it doesn't make sense to declare the variable
> as either volatile or non-volatile.
>
> John's proposal for MutableCallSite falls into this category,
> hence the controversy about how to express it.
> It is a specialization of the onAdvance (aka sync()) approach
> with in-place updates, and safepoint generations serving
> as phases. This seems reasonable given the likely need for
> atomicity during safepoints anyway.

I'd like to build both on top of LValue operations, but not in JDK 7.

Someone with a thesis to write should propose some conservative additions to the JMM to operate on LValues (or something like them). They can start with nonce-volatiles, but there are probably better ideas waiting to be recognized.

> While ideally it would be nice to integrate this with
> other phased computation even now (I can imagine defining
> adjunct classes for use with j.u.c.Phaser), I'm not too
> tempted to do so right now, because it is not always the
> best available option.

Let's keep thinking about building blocks, for inclusion in a JDK 7+O(1).

> One
> way to help reach this goal is to improve JVM-level support
> so that the most structured parallel code is also the fastest
> parallel code.

Yes. If the fastest code is unstructured, the fastest code will be unreasonably expensive in most cases. Which would be dumb economics.

-- John

Rich Hickey

unread,
Dec 11, 2010, 9:18:21 AM12/11/10
to JVM Languages


On Dec 10, 9:04 pm, John Rose <john.r.r...@oracle.com> wrote:
> On Dec 10, 2010, at 5:08 AM, Doug Lea wrote:
>
> > On 12/09/10 19:09, John Rose wrote:
> >> I started a thread on Google Groups to get more advice on safepoint-based
> >> invalidation, which the EG is naming MutableCallSite#sync.
>
> >>http://groups.google.com/group/jvm-languages/browse_thread/thread/9c9...
>
> > TL;DR version: The scheme you laid out here seems sound.
>
> Thanks very much, Doug.  I assume you are primarily referring to part (a) of my question:
>
> >> In order to work, the specification has to (a) make logical sense in the terms of the JMM, (b) be reasonably implementable by JVMs, and (c) be useful to programmers.
>
> Does anyone have a take on (c)?  (See my comments below on using this API...)
>
>
> In the JSR 292 case, we are supporting languages (like Java!) which can dynamically change their type schemas, but do so infrequently, perhaps at phase changes.  The changes will in general require complex sets of call sites to be updated, implying a dependency mechanism (see the Switcher class for this).  But the actual change-over can be slow, and may involve safepoints, as today with devirtualization that happens when new classes are loaded.
>
> (See Universe::flush_dependents_on around line 1100 of:
>  http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/tip/src/share/vm/me...)
>
> The idea is that the type schema change would be prepared (as a "next global state") by a writer thread #1.  At this point some call sites become "invalid in next state".  Those call sites would be switched, either by a switcher (which would in general control many call sites at once) or by editing them individually, and then calling 'sync' on them.
>
> The new state of all call sites must at this point be consistent with both the current and next global state of the type schema.  Typically, a lock would be held on the schema by writer thread #1, and the updated call site paths (now switched in and sync-ed) would pile up on the schema lock.  After the next global state is installed as current, the writer thread #1 lets go of the lock, and callers are free to shake out the new state of affairs.
>
> The shake-out may include installing optimized call paths in various call sites.  This means that a mutable call site might undergo two mutations, the first to get into a state consistent with both old and new schemas (perhaps seizing a lock, as above) and the second to optimize calling within the new state.  Both transitions generally need to grab a lock so they can have a consistent view of the global schema state.
>
> As an extra complication, the editing of call sites is probably performed in a concurrent manner, and lazily.  There might be writer thread #2, etc., performing updates to mutable call sites.  It it not clear to me whether the second write also needs a sync, but it looks like that may be the case to get performance.  Perhaps the right way to go is avoid the second write, and have the first write install the optimized path immediately, and queuing a 'sync' request to be executed after a short delay.  One key concern is to avoid "safepoint storms", by batching 'sync' requests.
>
> -- John

Thanks for this work, John!

I'm concerned about the separation of update and publication in this
API. It seems intractably difficult to reason about the behavior of
the system between multiple setTarget() calls and a subsequent sync().
I'd much rather see (if possible):

public static void syncTargets(MutableCallSite[] sites, MethodHandle[]
newTargets)

Then we can create arbitrary composite updates, out of view and over
time, using mere-mortals programming, and deliver them atomically.

Ditto Switchers (if they would even still be needed). And I wonder
about VolatileCallSite. Who would ever use that, given
MutableCallSite?

Perhaps setTarget() should simply go away?

Regards,

Rich

John Rose

unread,
Dec 15, 2010, 12:54:00 AM12/15/10
to jvm-la...@googlegroups.com
FTR: The tail of this conversation forked here:
http://mail.openjdk.java.net/pipermail/mlvm-dev/2010-December/002173.html

We'd be happy to take further comments on this alias also.

-- John

On Dec 11, 2010, at 6:18 AM, Rich Hickey wrote:

> Thanks for this work, John!
>
> I'm concerned about the separation of update and publication in this

> API...

John Rose

unread,
Dec 15, 2010, 1:31:59 AM12/15/10
to Da Vinci Machine Project, Rich Hickey, Brian Goetz, Doug Lea, Cliff Click, jsr-2...@jcp.org, jvm-la...@googlegroups.com
On Dec 13, 2010, at 12:19 AM, Rémi Forax wrote:
> On 12/12/2010 05:02 PM, Rich Hickey wrote:
>> Rémi's synchronized block only coordinates the activities of updaters.
>> Other threads may, and in some cases may have to, see some of the
>> setTarget results prior to the sync call, which could be a mess. The
>> point of syncTargets is to make the setting and the visibility atomic,
>> which synchronized cannot do.

Yes, syncTargets would be very powerful. But it would also be hard to
require of all JVMs. Speaking only for HotSpot, we don't do everything
with safepoints, because they are expensive. We use racy updates
whenever we can get away with it. The cost of a racy update is the
co-existence of two states, unpredictably visible to various threads.
I think that's a normal complexity cost for doing good language
implementations.

I think mapping a global atomic update to the JMM would require
more "magic edges" in the happens-before graph. The proposal
I posted, while weaker, has a correspondingly simpler impact on
the JMM. This is another way of observing that JVMs are likely to
have an easier time of adding the proposed functionality.

So a globally atomic update is harder to implement and harder
to specify. It is also overkill for a common use case, which is
delayed optimization of call sites. See below...

> Rich,
> I don't think you can provide an optimized method handle when syncing but
> more probably a default generic method that will later installs a fast path.

Thanks, Remi, for explaining this. I'm going to pile on here.

(I have one comment on your code; see below.)

I think of the pattern Remi sketches as a 2-phase commit.

Phase 0 and phase 2 are long-term phases. Phase 1 is brief but not atomic.

Phase 0 is the reign of the old target T0, before any MCS.setTarget.

Phase 1 starts when metadata locks are grabbed.
Under the locks, MCS.setTarget installs a default generic method T1.
(This is analogous to the JVM trick of setting a call site to the "unlinked" state.)

T1 is not racy. It is careful to grab a reader lock on metadata.
It is likely to install an optimized method T2, via a simple setTarget.
This may happen after an invocation count, or after user-level profiling.
(Therefore, it does not make sense to try to guess at T2 during phase 1.)

The MCS.sync operation is performed during phase 1, after all
relevant setTargets are done. It has the effect of excluding threads
from observing target T0. (I.e., it "flushes" T0 from the system.)

Phase 2 starts when metadata locks are released. During phase 2,
individual threads eventually execute T1. T1 lazily decides to install T2.
(Or several equivalent versions of T2.)

Threads which observe T1 (because of caching or inlining) will perform
sync actions which will force them to observe more recent call site targets.

During phase 0, T2 cannot be observed. During phase2, T0 cannot be observed.
The intermediate target T1 can be observed during any phase.

Compare that with Rich's proposed atomic syncTargets operation,
which would exclude phase 1 and target T1, for better and worse.

Another way of comparing syncTargets with setTarget+sync is
simply to syncTargets excludes target T1 from phase 0, whereas
the weaker proposal does not exclude T1.

This weakness can also be described in terms of two reader
threads, a Fast Reader and a Slow Reader. The Fast Reader
sees the result of the writer's setTarget of T1 in the same
nanosecond. The Slow Reader sees only T0 until it is
forced to pick up T1 by the sync operation.

-- John

> So if the result of setTarget is visible before the sync call, it will
> execute
> a default generic method which will also enter in a synchronized block.
> This will effectively coordinate things between readers and writers.
>
> invariant broken (writer):
> synchronized(masterLock) {
> // mutate meta data
> // update all callsites
> foreach(impacted callsites) {
> callsite.setTarget(callsite.defaultGenericPath);
> }
> sync(impacted callsites);
> }
>
> default generic method (reader):
> synchonized(masterLock) {
> // check arguments
> // use meta data to create a fast path
> }

(This next line should also be synchronized. -- JRR)

> setTarget(guard + fastpath);
>
>
> This pattern intends to mimick the way the VM do lazy deoptimization
> i.e mark the callsite as deoptimized and later when the callsite is used
> re-create a fast path.

Rémi Forax

unread,
Dec 15, 2010, 10:04:16 AM12/15/10
to jvm-la...@googlegroups.com
On 12/15/2010 07:31 AM, John Rose wrote:

> On Dec 13, 2010, at 12:19 AM, R�mi Forax wrote:
>> On 12/12/2010 05:02 PM, Rich Hickey wrote:
>>> R�mi's synchronized block only coordinates the activities of updaters.

>>> Other threads may, and in some cases may have to, see some of the
>>> setTarget results prior to the sync call, which could be a mess. The
>>> point of syncTargets is to make the setting and the visibility atomic,
>>> which synchronized cannot do.
> Yes, syncTargets would be very powerful. But it would also be hard to
> require of all JVMs. Speaking only for HotSpot, we don't do everything
> with safepoints, because they are expensive. We use racy updates
> whenever we can get away with it. The cost of a racy update is the
> co-existence of two states, unpredictably visible to various threads.
> I think that's a normal complexity cost for doing good language
> implementations.
>
> I think mapping a global atomic update to the JMM would require
> more "magic edges" in the happens-before graph. The proposal
> I posted, while weaker, has a correspondingly simpler impact on
> the JMM. This is another way of observing that JVMs are likely to
> have an easier time of adding the proposed functionality.
>
> So a globally atomic update is harder to implement and harder
> to specify. It is also overkill for a common use case, which is
> delayed optimization of call sites. See below...
>
>> Rich,
>> I don't think you can provide an optimized method handle when syncing but
>> more probably a default generic method that will later installs a fast path.
> Thanks, Remi, for explaining this. I'm going to pile on here.
>
> (I have one comment on your code; see below.)

[...]

>> So if the result of setTarget is visible before the sync call, it will
>> execute
>> a default generic method which will also enter in a synchronized block.
>> This will effectively coordinate things between readers and writers.
>>
>> invariant broken (writer):
>> synchronized(masterLock) {
>> // mutate meta data
>> // update all callsites
>> foreach(impacted callsites) {
>> callsite.setTarget(callsite.defaultGenericPath);
>> }
>> sync(impacted callsites);
>> }
>>
>> default generic method (reader):
>> synchonized(masterLock) {
>> // check arguments
>> // use meta data to create a fast path
>> }
> (This next line should also be synchronized. -- JRR)
>
>> setTarget(guard + fastpath);

Correct me if I'm wrong.
Le last line can be in the synchronized block but it don't have to.
The JMM guarantees that the current thread will see the new target (T2).
Other threads can see T1 and in that case will update to a T2' which is
semantically equivalent to T2.
It's a racy update.

>>
>> This pattern intends to mimick the way the VM do lazy deoptimization
>> i.e mark the callsite as deoptimized and later when the callsite is used
>> re-create a fast path.

R�mi

Rich Hickey

unread,
Dec 15, 2010, 12:22:05 PM12/15/10
to John Rose, Da Vinci Machine Project, Brian Goetz, Doug Lea, Cliff Click, jsr-2...@jcp.org, jvm-la...@googlegroups.com

This API has several presumptions:

- There is some singular global metadata.

- It is protected by a lock. What if it is an immutable structure in
an AtomicReference and modified via CAS? Or passed by value to targets/
handles?

- Call sites will always use a two-phase update, i.e. they will need
to rediscover their binding vs directly target it (during metadata
update) given their cached data and the new metadata. You are
piggybacking on this presumption to provide a consistency hook
(blocking on the read lock).

While T1 is not strictly MT racy (given the adoption of this pattern
wholesale), it is prone to error, in that should the update phase
interleave ordinary code and setTarget calls, it can see an
inconsistent state (i.e. nothing keeps call sites the update thread
encounters from moving to T1.5 prior to the sync call, since the
updater thread holds the lock). Many dynamic language implementations
use their own dynamic code in a way that could trip over this.

This is an API that only works correctly given an extremely
constrained pattern of use. That's fine, if unfortunate, but needs to
go in the docs I think. It is essential that the pattern be - grab the
lock, then 'modify' all metadata (no setTargets), then update all
targets (no calls through callsites), then sync (under the lock), and
that the targets *must be* stubs that grab the lock and then a) do
nothing (if not lazy) or b) setTarget (under the lock), if lazy.

The phases you describe are derived from this pattern and do not fall
out of the API.

I proposed one thing (syncTargets), and asked one question (does that
obsolete setTarget).

The use case Remi and you describe makes it clear that something like
setTarget (with ordinary field synchronization semantics) would still
have utility for two-phase callsites. It would be overkill to sync
again, presuming fastpath calculation is idempotent and acceptable to
do over in multiple threads. However, it's error prone, as Remi's
example is broken, as you pointed out - setting the target outside of
the lock means the site can (permanently) miss an update.

If syncTargets were possible, you'd require no locks, and you might
want something more like a CAS-based swapTarget for lazy sites.
Coordination around metadata (if any) would be completely orthogonal.

invariant broken (writer):
{
// calculate new metadata
// determine new targets
// update all callsites
syncTargets(impacted callsites, new targets);
}

default generic method (reader):
{
// check arguments
// create a fast path
swapTarget(this?, guard + fastpath);
}

I'll leave it to your expertise whether this combination is harder to
implement and harder to specify. That certainly dominates. The
benefits of syncTargets and swapTarget from an API perspective are
that, given appropriate regard for the expense of syncTargets - you
can't get it wrong, the API itself delivers the coordination
semantics, there is no elaborate pattern to replicate, and the field
of use is broader.

Rich

John Rose

unread,
Dec 15, 2010, 3:08:23 PM12/15/10
to jvm-la...@googlegroups.com, Brian Goetz, Cliff Click, jsr-2...@jcp.org, Doug Lea, Da Vinci Machine Project
On Dec 15, 2010, at 7:04 AM, Rémi Forax wrote:

>>> default generic method (reader):
>>> synchonized(masterLock) {
>>> // check arguments
>>> // use meta data to create a fast path
>>> }
>> (This next line should also be synchronized. -- JRR)
>>
>>> setTarget(guard + fastpath);
>
> Correct me if I'm wrong.
> Le last line can be in the synchronized block but it don't have to.
> The JMM guarantees that the current thread will see the new target (T2).
> Other threads can see T1 and in that case will update to a T2' which is
> semantically equivalent to T2.
> It's a racy update.

Racy updates have the risk of not getting picked up by Slow Reader threads.

But, as you point out, in this case T1 synchronizes any Slow Reader and forces him to see T2.

So, you're right.

I guess the principle is that racy updates are OK as long as the previous state (T1) is allowed and will eventually force all readers to go to the new state (T2).

-- John

Rich Hickey

unread,
Dec 15, 2010, 4:03:29 PM12/15/10
to jvm-la...@googlegroups.com, Brian Goetz, Cliff Click, jsr-2...@jcp.org, Doug Lea, Da Vinci Machine Project

Why can't T3 happen between the synchronized block and the setUpdate
call, and get overwritten with a T1-based decision?

Rich

Rémi Forax

unread,
Dec 15, 2010, 5:27:42 PM12/15/10
to jvm-la...@googlegroups.com, Rich Hickey, Brian Goetz, Cliff Click, jsr-2...@jcp.org, Doug Lea, Da Vinci Machine Project
On 12/15/2010 10:03 PM, Rich Hickey wrote:
>
> On Dec 15, 2010, at 3:08 PM, John Rose wrote:
>

It can. I was wrong.
setTarget() has to be in the synchronized block.

R�mi

John Rose

unread,
Dec 15, 2010, 6:27:56 PM12/15/10
to jvm-la...@googlegroups.com, Rich Hickey, Brian Goetz, Cliff Click, jsr-2...@jcp.org, Doug Lea, Da Vinci Machine Project
On Dec 15, 2010, at 2:27 PM, Rémi Forax wrote:

>> Why can't T3 happen between the synchronized block and the setUpdate call, and get overwritten with a T1-based decision?
>>
>> Rich
>>
>
> It can. I was wrong.
> setTarget() has to be in the synchronized block.

Foo; I agree. The T1 update can float into the far future and screw things up (overwriting T3).

In our EG meeting this morning we decided to provide better usage guides in the javadoc.

Rich, I'm going to think a while about your alternative proposal of syncTargets and swapTarget:

On Dec 15, 2010, at 9:22 AM, Rich Hickey wrote:

> The benefits of syncTargets and swapTarget from an API perspective are that, given appropriate regard for the expense of syncTargets - you can't get it wrong, the API itself delivers the coordination semantics, there is no elaborate pattern to replicate, and the field of use is broader.

Quick responses:

SyncTargets would be a true global atomic transaction, with barriers in both directions to floating reads and writes.

If you were willing to update the targets one at a time, but still wanted two-way barriers, you could use VolatileCallSite, and file a performance bug for JVMs to optimize the case of a stable volatile.

The CAS idea does not translate well to function pointers, because they are necessarily opaque. (Function equivalence is undecidable...) The Object.equals method on them is pretty meaningless. I think you would need a double-CAS of some sort, which replaced a <descriptor, method handle> pair with a new version.

I think I need a better understanding of how STM techniques are implemented on a generic JVM, and what the pain points are.

-- John

Reply all
Reply to author
Forward
0 new messages