slow downs in invokedynamic native code

562 views
Skip to first unread message

Jochen Theodorou

unread,
Mar 15, 2016, 5:28:41 AM3/15/16
to jvm-la...@googlegroups.com
Hi,


One of our users has a web application using Groovy with indy activated
and describes the following problem:

> At random intervals and a random times our web servers will go from serving responses in the 300 ms range to taking 30 seconds or more. Sometimes the servers will recover, sometimes they require a restart of the webserver (spring boot/tomcat). When the applications slow down we always see the tomcat thread pool hit the maximum size. Every single thread in the thread pool is in the RUNNABLE state but appears to be making no progress. Successive thread dumps show that the stacks are changing, but VERY slowly. The top of the stack is always this method:
>
> at java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal(Native Method).
>
> The other common condition is that whatever application code is on the stack is always dynamically compiled. Code that is @CompileStatic is NEVER on the stack when we see these slowdowns.
>
> The thread dumps showed that the application code is never waiting on locks, socket reads, db connections, etc.

Mabye worth mentioning, that with @CompileStatic annotated code is not
using invokedynamic....

Anyway... I am wondering if anyone has had something like this before.
My first reaction to this description would be a bug in the JVM... or a
performance bottleneck in the JVM.... but to spend literally seconds in
native code is pretty bad in any case. On the other hand I am not having
this web application here to experiment with. The only part I did hear
is, that removing the indy version of Groovy seems to fix the problem.
So it must be either our use of indy, or indy itself having a problem
here. But asides from this conclusion I am quite at a loss.

Question 1: Did anyone have a similar problem before?
Question 2: Maybe more to the JVM engineers, is it even possible for the
indy part to suddenly tak seconds on compilation - or especially the
mentioned native method?

bye Jochen

Hannes Wallnöfer

unread,
Mar 16, 2016, 6:28:34 AM3/16/16
to jvm-la...@googlegroups.com
Hi Jochen,

we recently had a report on nashorn-dev that could be related. A user is re-evaluating the same or similar code again and seeing more than 20x slowdown compared to the fist evaluation.

http://mail.openjdk.java.net/pipermail/nashorn-dev/2016-March/006024.html

The thing is that he is using fresh ScriptEngines for the second evaluation, so the Nashorn engines should not share anything. As with your case, Jvisualvm shows that 80% of time is spent in java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal().

Hannes




--
You received this message because you are subscribed to the Google Groups "JVM Languages" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jvm-language...@googlegroups.com.
To post to this group, send email to jvm-la...@googlegroups.com.
Visit this group at https://groups.google.com/group/jvm-languages.
For more options, visit https://groups.google.com/d/optout.

Duncan MacGregor

unread,
Mar 16, 2016, 6:44:50 AM3/16/16
to jvm-la...@googlegroups.com
I haven't seen this, but setCallSiteTargetNormal does have to get the compiler lock, so contention can definitely cause problems. Is there a chance you're repeatedly invalidating and setting targets? Or generating lots of new mutable call sites?

The other possibility is that the data structures that store the target information aren't scaling, but j have seen a big problem there before, and Magik on Java apps tend to be large, so I'd expect to have hit any problems by now.

Duncan.

Hannes Wallnöfer

unread,
Mar 16, 2016, 6:52:43 AM3/16/16
to jvm-la...@googlegroups.com
For the Nashorn report, the only thing we know is that it involves pretty large scripts that are being re-evaluated in new ScriptEngines, with 8 engines at a time. So it seems quite possible that some implementation detail is stressed beyond the point where it performs efficiently.

Hannes

Remi Forax

unread,
Mar 16, 2016, 7:41:35 AM3/16/16
to jvm-la...@googlegroups.com, Da Vinci Machine Project
The symptoms are really like a deoptimization storm,
setCallSiteTargetNormal goes to a safepoint (which is worst that only having the compiler/JIT lock because all threads are stopped),
when either a code calls setTarget or a SwithPoint is invalidated.

You have a deopt storm when the JIT compiles a code that contains a callsite that is always invalid, so the VM enters in loop like this,
    JIT compile a blob
    execute the blob
    deopt
    jump back in the interpreter
    rinse and repeat

The root cause is a bug in the invalidation logic of the language runtime (not the VM) but it's hard to spot without a reproducible test case because
when the JIT compiles a blob of codes there are several callsites inside that blob and usually only one is the faulty one.

We already have discussed about that point several times,
John is a proponent of marking the callsite has should never be optimized again,
which at least stop the storm issue but it sweeps the real cause of the bug under carpet,
I would prefer, consider these kind of bugs as a language runtime bugs that should be investigated by the runtime developers.

Perhaps a middle ground is to mark the callsite as not compilable anymore *and* emit a warning (like when the code cache is full) to not hide the root cause of the bug.

Rémi


Jochen Theodorou

unread,
Mar 16, 2016, 8:27:17 AM3/16/16
to jvm-la...@googlegroups.com


On 16.03.2016 11:44, Duncan MacGregor wrote:
> I haven't seen this, but setCallSiteTargetNormal does have to get the
> compiler lock, so contention can definitely cause problems. Is there a
> chance you're repeatedly invalidating and setting targets? Or generating
> lots of new mutable call sites?

In Groovy all callsites are mutable. So most probably yes. They must be,
otherwise I could not react to dynamic types.

bye Jochen

Jochen Theodorou

unread,
Mar 16, 2016, 8:45:20 AM3/16/16
to jvm-la...@googlegroups.com


On 16.03.2016 12:41, Remi Forax wrote:
> The symptoms are really like a deoptimization storm,
> setCallSiteTargetNormal goes to a safepoint (which is worst that only
> having the compiler/JIT lock because all threads are stopped),
> when either a code calls setTarget or a SwithPoint is invalidated.
>
> You have a deopt storm when the JIT compiles a code that contains a
> callsite that is always invalid, so the VM enters in loop like this,
> JIT compile a blob
> execute the blob
> deopt
> jump back in the interpreter
> rinse and repeat
>
> The root cause is a bug in the invalidation logic of the language
> runtime (not the VM) but it's hard to spot without a reproducible test
> case because
> when the JIT compiles a blob of codes there are several callsites inside
> that blob and usually only one is the faulty one.

yeah, the test case is a problem... assuming you have code like this

def foo(X x) {
x.bar() // callsite to discuss here!
}
class X {def bar(){1}}
class Y extends X {def bar(){2}}

10_000_000.times {
foo(new X())
foo(new Y())
}

and assuming you have the logic, that you call always X::bar for
instances of X and Y::bar for instances of Y, meaning in this case you
will gave to set the target of the callsite again and again for every
call. Would that be a case like you are talking about?

If yes, I would be wondering why JIT compilation happens on the first
time here already... that's not what I do remember from invokedynamic.
If there have to be several (1000+) calls of the same type first, then
it should be still far from a deoptimization storm, or not?

If this is indeed the kind of bug we are talking about, then the
solution would be to implement caching for morphic callsites. On the
other side I find it interesting I find it interesting, that we won't
get the problem with non-indy. But maybe it can be explained by a more
eager optimizing JIT (or getting out of interpreter mode at all)

> We already have discussed about that point several times,
> John is a proponent of marking the callsite has should never be
> optimized again,
> which at least stop the storm issue but it sweeps the real cause of the
> bug under carpet,

yeah, I would not like to have that as well.

> I would prefer, consider these kind of bugs as a language runtime bugs
> that should be investigated by the runtime developers.
>
> Perhaps a middle ground is to mark the callsite as not compilable
> anymore *and* emit a warning (like when the code cache is full) to not
> hide the root cause of the bug.

How about showing those faulty callsites in jvisualvm or have another
inspection tool for that?


bye Jochen

Jochen Theodorou

unread,
Mar 16, 2016, 8:46:44 AM3/16/16
to jvm-la...@googlegroups.com


On 16.03.2016 11:28, Hannes Wallnöfer wrote:
> Hi Jochen,
>
> we recently had a report on nashorn-dev that could be related. A user is
> re-evaluating the same or similar code again and seeing more than 20x
> slowdown compared to the fist evaluation.
>
> http://mail.openjdk.java.net/pipermail/nashorn-dev/2016-March/006024.html
>
> The thing is that he is using fresh ScriptEngines for the second
> evaluation, so the Nashorn engines should not share anything. As with
> your case, Jvisualvm shows that 80% of time is spent in
> java.lang.invoke.MethodHandleNatives.setCallSiteTargetNormal().

ok, thanks, good to know there are at least potential issues around (be
it JVM or wrong use through the language)

bye Jochen

Remi Forax

unread,
Mar 16, 2016, 10:44:37 AM3/16/16
to jvm-la...@googlegroups.com


----- Mail original -----
> De: "Jochen Theodorou" <blac...@gmx.org>
> À: jvm-la...@googlegroups.com
> Envoyé: Mercredi 16 Mars 2016 13:45:18
> Objet: Re: [jvm-l] slow downs in invokedynamic native code
>
>
>
let consider a slightly different code,

def foo(X x) {
x.bar() // callsite to discuss here!
}
class X {def bar(){1}}
class Y extends X {def bar(){2}}

1_000.times {
10_000.times {
foo(new X())
}
foo(new Y())
}

in that case, foo(new X()) is executed enough time to trigger the JIT, and the call foo(new Y()) will ask for a deopt
(and having several threads hammering this code will make the problem worst).

>
> If this is indeed the kind of bug we are talking about, then the
> solution would be to implement caching for morphic callsites.

or just stop to do any caching by having a callsite doing a fold + exactinvoker with the combiner of the fold returning a non constant method handle

> On the other side I find it interesting I find it interesting, that we won't
> get the problem with non-indy. But maybe it can be explained by a more
> eager optimizing JIT (or getting out of interpreter mode at all)
>

if there are no indy, hotspot doesn't consider the method implementation has being constant, thus there is no issue because the JIT will not try to inline it

> > We already have discussed about that point several times,
> > John is a proponent of marking the callsite has should never be
> > optimized again,
> > which at least stop the storm issue but it sweeps the real cause of the
> > bug under carpet,
>
> yeah, I would not like to have that as well.
>
> > I would prefer, consider these kind of bugs as a language runtime bugs
> > that should be investigated by the runtime developers.
> >
> > Perhaps a middle ground is to mark the callsite as not compilable
> > anymore *and* emit a warning (like when the code cache is full) to not
> > hide the root cause of the bug.
>
> How about showing those faulty callsites in jvisualvm or have another
> inspection tool for that?

yes, any form of logging, a mbean that exports information about non compilable call sites is fine too.

>
>
> bye Jochen

Rémi

Remi Forax

unread,
Mar 16, 2016, 10:57:51 AM3/16/16
to Da Vinci Machine Project, Nick Houghton, jvm-la...@googlegroups.com
Hi Benjamin,


De: "Benjamin Sieffert" <benjamin...@metrigo.de>
À: "Da Vinci Machine Project" <mlvm...@openjdk.java.net>
Cc: "Nick Houghton" <nhou...@gmail.com>, jvm-la...@googlegroups.com
Envoyé: Mercredi 16 Mars 2016 13:59:58
Objet: Re: [jvm-l] slow downs in invokedynamic native code

Hi,

I did run into this issue with Nashorn as well. That was back in 2013, but a certain Nick Houghton (cc, who will be glad to here there’s finally a bug filed) contacted me about that in Dec'15, asking whether I had made any progress, because he had discovered the same issue.
Here’s my original thread: http://mail.openjdk.java.net/pipermail/mlvm-dev/2013-September/005489.html
Maybe Jochen or Remi will still remember.

I don't remember ... i'm like a goldfish ... but the internet saves me (thanks for the link)



Anyways, I think we did solve this issue by eliminating a pattern where we would globally define objects in our script environment via calls to NashornScriptEngine.put. I suspect that overriding the same global variable repeatedly and hence, I suppose, invalidating corresponding call sites, might’ve triggered the pathologic behaviour.

Just wanted to let you know, in case it help narrowing down the issue / understanding the impact.

It's a perfect example of a Nashorn bug (not a VM bug), any global (maybe constant?) should have a counter counting the number of time the global is changed, and if the counter is above a threshold, the runtime should stop trying to see it as constant.

Perhaps we should have a section about deopt storm in the documentation saying that the code of the runtime should always have a backup strategy in case the method handle chain is constant enough for the JIT but not constant if you take a look to the whole program.



Kind Regards,
Benjamin

regards,
Rémi



_______________________________________________
mlvm-dev mailing list
mlvm...@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev




--
Benjamin Sieffert
metrigo GmbH | A Zalando Company
Lagerstraße 36
20357 Hamburg

Geschäftsführer: Tobias Schlottke, Philipp Erler

Die Gesellschaft ist eingetragen beim Registergericht Hamburg HRB 120447

_______________________________________________
mlvm-dev mailing list
mlvm...@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Hannes Wallnöfer

unread,
Mar 16, 2016, 11:37:05 AM3/16/16
to jvm-la...@googlegroups.com
Hi Benjamin and Rémi,
 
Anyways, I think we did solve this issue by eliminating a pattern where we would globally define objects in our script environment via calls to NashornScriptEngine.put. I suspect that overriding the same global variable repeatedly and hence, I suppose, invalidating corresponding call sites, might’ve triggered the pathologic behaviour.

Just wanted to let you know, in case it help narrowing down the issue / understanding the impact.

It's a perfect example of a Nashorn bug (not a VM bug), any global (maybe constant?) should have a counter counting the number of time the global is changed, and if the counter is above a threshold, the runtime should stop trying to see it as constant.

Perhaps we should have a section about deopt storm in the documentation saying that the code of the runtime should always have a backup strategy in case the method handle chain is constant enough for the JIT but not constant if you take a look to the whole program.



Nashorn has and always has had the feature to turn callsites into megamorphic mode after 8 or 16 (depending on version and options) relink events. In the case that currently popped up on nashorn-dev, the user reports the problem occurs with freshly created ScriptEngines, meaning no code will be reused from the old ScriptEngines.

Now even if we assume some code is reused between engines and needs to be relinked, it's very hard to explain a > 20x performance drop between first and second evaluation. After all, callsites have to be linked the first time around as well.

Hannes


Reply all
Reply to author
Forward
0 new messages