Why a Java method invocation is slower when you call it somewhere else in your code?

437 views
Skip to first unread message

J Crawford

unread,
Apr 8, 2017, 2:13:01 AM4/8/17
to mechanical-sympathy
Hello,

I think here might be the best place to find someone who may know what’s going on with the JVM in that test: http://stackoverflow.com/questions/43281743/what-happens-inside-the-jvm-so-that-a-method-invocation-in-java-becomes-slower-w

Basically I warmup a method and then when I call it on another location (few lines below) it is much slower. Trying to understand and mitigate this effect so my method invocation is consistently fast.

Thanks!

-JC

Shevek

unread,
Apr 8, 2017, 2:32:32 AM4/8/17
to mechanica...@googlegroups.com
Without digging into the actual reasons, it's likely that growing a
larger arraylist is slower than growing a smaller one, and since you
never clear it, the later calls will therefore be slower. I would very
much like a benchmark of method call time to be allocation- and
garbage-free, otherwise I really don't know what's being tested.

S.
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-symp...@googlegroups.com
> <mailto:mechanical-symp...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

jez prime

unread,
Apr 8, 2017, 4:28:44 AM4/8/17
to mechanical-sympathy
I think Kirk's answer (He also posts on this group) is extremely valid. I'd certainly agree with him your warmup isn't necessarily optimising what you think it is and your unit of work is too small for valid measurement.. 

Gil Tene

unread,
Apr 8, 2017, 11:31:57 AM4/8/17
to mechanical-sympathy
The simple answer is that it's not slower. You're just measuring wrong.

Kirk Pepperdine

unread,
Apr 8, 2017, 12:04:18 PM4/8/17
to mechanica...@googlegroups.com

> On Apr 8, 2017, at 5:31 PM, Gil Tene <g...@azul.com> wrote:
>
> The simple answer is that it's not slower. You're just measuring wrong.

I think this is what I was saying, just in a less direct manner. ;-)


Gil Tene

unread,
Apr 8, 2017, 12:14:36 PM4/8/17
to mechanical-sympathy
There is a lot in this code to "learn" from. You can/should obviously use something like jmh to save you much of the trouble of making these mistakes. But if you want to learn about the various mistakes that people commonly make when trying to measure stuff, this code is a good starting point. It is a classic "how not to micro-benchmark" example and a good teaching/learning opportunity. So if we change the title from "Why a Java method invocation is slower..." to "Why does measuring like this give me the wrong impression that...", and putting aside style comments, here are some comments and highlights of likely reasons of why you are seeing a bigger number for the time elapsed in your single invocation from a second call site compared to the 1,000,000 invocations from the earlier call site:

- The first call site does not warm up the method. It warms up the call site. "Warmup" is not a proper term fir what you are doing there... The method can be (and likely was) inlined into the call site and optimized there.

- When the second call site is invoked once, it May not have been inlined there yet, and may be calling interpreted or lower tiered optimization versions of the method. If excercized enough, the second call site would likely inline as well.

Separately interesting:

- Your mySleep won't actually do what you think it does. The entire method can be optimized away to nothing after inking at the call site by the JIT once the calls to it actually warm up enough, since it has no side effects and nothing is done with its return code.

- Your while(true) loop in the code and the output don't seem to match. Where is rest of the output? Or is this output from an earlier code version without the while(true) loop?

- You are accumulating more and more contents into an array list. At some point the arraylist will be resized. That does not yet happen in your specific run (1,000,000 iterations plus 1 still fits in the initial 2^20 size), but if you did keep running... if you want to measure while excercizing a large data structure like this, I'd initialize it all and then wrap around it (with some modulu of length) rather than keep accumulating.

And last, calls to the same Java method at different sites *can* end being slower or faster depending on many factors. For example, when inlined at the call sites, the parameters passed may effect the optimizations possible, and may vary depending on call site. More importantly, whether or not a method is inlined is call-site dependent. E.g. some inlining depth or overall calling code size threshold may have been exceeded to stop inlining in one site but not in another.

Kirk Pepperdine

unread,
Apr 8, 2017, 12:36:58 PM4/8/17
to mechanica...@googlegroups.com

> On Apr 8, 2017, at 6:14 PM, Gil Tene <g...@azul.com> wrote:
>
> There is a lot in this code to "learn" from.

Agreed….

> You can/should obviously use something like jmh to save you much of the trouble of making these mistakes. But if you want to learn about the various mistakes that people commonly make when trying to measure stuff, this code is a good starting point. It is a classic "how not to micro-benchmark" example and a good teaching/learning opportunity. So if we change the title from "Why a Java method invocation is slower..." to "Why does measuring like this give me the wrong impression that...", and putting aside style comments, here are some comments and highlights of likely reasons of why you are seeing a bigger number for the time elapsed in your single invocation from a second call site compared to the 1,000,000 invocations from the earlier call site:
>
> - The first call site does not warm up the method. It warms up the call site. "Warmup" is not a proper term fir what you are doing there... The method can be (and likely was) inlined into the call site and optimized there.
>
> - When the second call site is invoked once, it May not have been inlined there yet, and may be calling interpreted or lower tiered optimization versions of the method. If excercized enough, the second call site would likely inline as well.
>
> Separately interesting:
>
> - Your mySleep won't actually do what you think it does. The entire method can be optimized away to nothing after inking at the call site by the JIT once the calls to it actually warm up enough, since it has no side effects and nothing is done with its return code.

Well, this won’t happen in OpenJDK because of the return value.

>
> - Your while(true) loop in the code and the output don't seem to match. Where is rest of the output? Or is this output from an earlier code version without the while(true) loop?
>
> - You are accumulating more and more contents into an array list. At some point the arraylist will be resized.

It’s pre-sized…

I’m going to play with this before commenting on the rest.

— Kirk

Gil Tene

unread,
Apr 8, 2017, 12:39:37 PM4/8/17
to mechanica...@googlegroups.com


Sent from my iPad

> On Apr 8, 2017, at 9:37 AM, Kirk Pepperdine <ki...@kodewerk.com> wrote:
>
>
>> On Apr 8, 2017, at 6:14 PM, Gil Tene <g...@azul.com> wrote:
>>
>> There is a lot in this code to "learn" from.
>
> Agreed….
>
>> You can/should obviously use something like jmh to save you much of the trouble of making these mistakes. But if you want to learn about the various mistakes that people commonly make when trying to measure stuff, this code is a good starting point. It is a classic "how not to micro-benchmark" example and a good teaching/learning opportunity. So if we change the title from "Why a Java method invocation is slower..." to "Why does measuring like this give me the wrong impression that...", and putting aside style comments, here are some comments and highlights of likely reasons of why you are seeing a bigger number for the time elapsed in your single invocation from a second call site compared to the 1,000,000 invocations from the earlier call site:
>>
>> - The first call site does not warm up the method. It warms up the call site. "Warmup" is not a proper term fir what you are doing there... The method can be (and likely was) inlined into the call site and optimized there.
>>
>> - When the second call site is invoked once, it May not have been inlined there yet, and may be calling interpreted or lower tiered optimization versions of the method. If excercized enough, the second call site would likely inline as well.
>>
>> Separately interesting:
>>
>> - Your mySleep won't actually do what you think it does. The entire method can be optimized away to nothing after inking at the call site by the JIT once the calls to it actually warm up enough, since it has no side effects and nothing is done with its return code.
>
> Well, this won’t happen in OpenJDK because of the return value.

The return value "saves" you only as long as the method doesn't get inlined. After it is inlined, the fact that the return value isn't used allows the JIT to kill the entire code...

>
>>
>> - Your while(true) loop in the code and the output don't seem to match. Where is rest of the output? Or is this output from an earlier code version without the while(true) loop?
>>
>> - You are accumulating more and more contents into an array list. At some point the arraylist will be resized.
>
> It’s pre-sized…
>
> I’m going to play with this before commenting on the rest.
>
> — Kirk
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/E7keRLsVcyk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Kirk Pepperdine

unread,
Apr 8, 2017, 12:40:46 PM4/8/17
to mechanica...@googlegroups.com

>>>
>>> - Your mySleep won't actually do what you think it does. The entire method can be optimized away to nothing after inking at the call site by the JIT once the calls to it actually warm up enough, since it has no side effects and nothing is done with its return code.
>>
>> Well, this won’t happen in OpenJDK because of the return value.
>
> The return value "saves" you only as long as the method doesn't get inlined. After it is inlined, the fact that the return value isn't used allows the JIT to kill the entire code…

You’d think but not in my experience.

Regards,
Kirk

Gil Tene

unread,
Apr 9, 2017, 11:49:39 AM4/9/17
to mechanica...@googlegroups.com
Stock OpenJDK currently inlines and completely eliminates:

  public static int wasteSomeTime(int t) {
   
int x = 0;
   
for(int i = 0; i < t * 10000; i++) {
      x
+= (t ^ x) % 93;
   
}
     
return x;
 
}

When called like this:

  wasteSomeTime(sleepArg);


So return values demonstrably don't prevent the optimization...

The optimization will not happen if inlining the method at the call site. 

I built a small set of jmh benchmarks to demonstrate this. They result in this:

Benchmark                                             (benchLoopCount)  (sleepArg)   Mode  Cnt           Score           Error  Units
MethodInliningExampleBench.noRetValIntLoop                      100000           1  thrpt    5  2830940580.903 ±  52900090.474  ops/s
MethodInliningExampleBench.noRetValIntLoopNoInlining            100000           1  thrpt    5        5500.356 ±       245.758  ops/s
MethodInliningExampleBench.retValIntLoop                        100000           1  thrpt    5  2877030926.237 ± 134788500.109  ops/s
MethodInliningExampleBench.retValIntLoopNoInlining              100000           1  thrpt    5           0.219 ±         0.007  ops/s


Which demonstrates that when inlining is *prevented* at the caller there is a real difference between having return value and not (the loop in the method gets optimized away only if there is no return value). But that when inlining is not prevented at the caller and the return value is not used, both cases get optimized away the same way. 

And since it is "hard" to reliably disallow inlining (without e.g. using Aleksey's cool @CompilerControl(CompilerControl.Mode.DONT_INLINE annotations in jmh), inlining can bite you and wreck your assumptions at any time...

Interestingly, as you can see from the same jmh tests above, while stock OpenJDK will optimize away the above code, it *currently* won't optimize away this code:

  public static long mySleepL1(long t) {
   
long x = 0;
   
for(int i = 0; i < t * 10000; i++) {
      x
+= (t ^ x) % 93;
   
}
   
return x;
 
}

Which differs only in using longs instead of ints.

The results for the longs tests are:

Benchmark                                              (benchLoopCount)  (sleepArg)   Mode  Cnt           Score           Error  Units
MethodInliningExampleBench.noRetValLongLoop                      100000           1  thrpt    5  2924098828.778 ± 234409260.906  ops/s
MethodInliningExampleBench.noRetValLongLoopNoInlining            100000           1  thrpt    5           0.243 ±         0.013  ops/s
MethodInliningExampleBench.retValLongLoop                        100000           1  thrpt    5           0.254 ±         0.014  ops/s
MethodInliningExampleBench.retValLongLoopNoInlining              100000           1  thrpt    5           0.246 ±         0.012  ops/s



So the using longs seems to defeat some of the *current* OpenJDK optimizations. But how much would you want to bet on that staying the same in the next release? 

Similarly, *current* stock OpenJDK won't recognize that System.nanoTime() and System.currentTimeMillis() have no side effects, so the original example method:
 
    public static long mySleep(long t) {
       
long x = 0;
       
for(int i = 0; i < t * 10000; i++) {
            x
+= System.currentTimeMillis() / System.nanoTime();
       
}
       
return x;
   
}

Will not optimize away at the call site on *current* OpenJDK builds.  But this can change at any moment as new optimizations and metadata about intrinsics are added in coming versions or with better optimizing JITs.

In all these cases, dead code *might* be removed. And whether or not it does can depend on the length of the run, the data you use, the call site, the phase of the moon 🌙 , or the version of the JDK or JIT that happens to run your code. Any form of comparison (between call sites, versions, etc.) with such dead code involved is flakey, and will often lead to "surprising" conclusions. Sometimes those surprising conclusions happen right away. Sometimes the happen a year later, when you test again using your previously established, tried-and-tested, based-on-experience tests that no longer do what you think they do...

E.g. I'm fairly sure OpenJDK will at some point (soon?) need to catch up optimizations on longs to match optimizations on ints in many cases (who uses ints anymore? except for array indexing), which will probably break a lot of benchmarks out there that may be inadvertantly relying on longs optimizations to not be happening.


Kirk Pepperdine

unread,
Apr 10, 2017, 2:04:10 AM4/10/17
to mechanica...@googlegroups.com
Hi Gil,

Interesting, I’ve run stuff in the past were simply adding a return prevented the code from being JIT’ed away. What I’ve also noted is that any time you stray from int the expected optimization will not be applied. Something like

public int sum( int a, int b, int n) {
    Int temp = 0;
    for (int I = 0; I <n; I+++)
        temp += a + b;
}

So, assuming a + b cannot be folded, the next obvious thing is that it is loop invariant and should be hoisted but change everything to long and no hoisting.

Kind regards,
Kirk

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages