Huge, unexpected performance overhead (of static methods?)

527 views
Skip to first unread message

Shevek

unread,
Feb 4, 2019, 9:26:17 PM2/4/19
to mechanical-sympathy
Hi,

I have a very simple routine which, on some JVMs/systems, which I have
not yet entirely narrowed down, suffers a 50x slowdown. The code is
included below.

In perf-java-flames, I see:

<clinit> -> readVarintTable (90%), of which:
readVarintTable -> readVarint (4%)
readVarintTable -> resolve_static_call -> libjvm.so (86%) <-- THE WAT?

So what is a perfectly trivial method doing spending 90% of it's time
inside resolve_static_call? What's going on?

My google searches turned up a note to do with loop unrolling and some
optimizations breaking for static methods, so I will try this with
non-static methods.

Slow on openjdk 1.8.0_191, Xeon E-2176M (Lenovo P1 laptop, 12-thread)
Fast on openjdk 1.8.0_191, Core i7-5600U (Lenovo T550 laptop, 4-thread)
I think Fast on Xeon E5620 (Supermicro rack, 8 thread).
I think Slow on AMD Epyc 7301 16-core, 64-thread. Will investigate more.

Knocking off the obvious:
* It's not doing meaningful amounts of allocation, and no GC.
* Total data size is 100M-1G.
* Both machines running same code, same dataset, ...
* This is single-threaded, and runs early in the JVM startup.
* It's doing I/O over JAR-resource -> BufferedInputStream ->
DataInputStream but it's not I/O contended, based on the calls in the
flamegraph.

But I feel that a 50x slowdown in an unexplained native call because of
... what, the number of cores ... bears some explanation. I can post the
flamegraphs if that helps.

And here is the code, which is as boring as anything, so what gives:


/** Reads a little-endian varint with no optimization for negative
numbers. */
private static int readVarint(@Nonnull DataInputStream in) throws
IOException {
int result = 0;
for (int shift = 0; shift < 32; shift += 7) {
int b = in.read();
if (b == -1)
throw new EOFException("Truncated varint in stream.");
result |= (b & 0x7f) << shift;
if ((b & 0x80) == 0)
return result;
}
throw new IOException("Malformed varint in stream.");
}

@Nonnull
private static int[] readVarintTable(@Nonnull DataInputStream in,
@Nonnegative int sublength) throws IOException {
int[] out = new int[readVarint(in) * sublength];
for (int i = 0; i < out.length; i++)
out[i] = readVarint(in);
return out;
}

static {
try {
DataInputStream in = new DataInputStream(
new BufferedInputStream(
Parser.class.getResourceAsStream("Parser.dat")
)
);
table = readVarintTable(in);
} // etc

Shevek

unread,
Feb 4, 2019, 10:54:54 PM2/4/19
to mechanical-sympathy
Update: I now think this is slow (but not AS slow) on the Core i7-5600U
so this may be a regression from _181 to _191, and not entirely
CPU-dependent?

Wrapping the two static methods in an otherwise-pointless class, and
calling them as instance methods made the code much faster.

Is it relevant that the class in question is 522419 Kb in size and
contains 1696 (mostly instance) methods? No individual method in it is
larger than 8K, so they all JIT.

The outer readVarintTable method is called about 100K-500K times, so
there's plenty of chance to replace it.

No synchronization is used.

I'm still in "WAT?" territory.

S.

Todd Lipcon

unread,
Feb 4, 2019, 11:01:18 PM2/4/19
to mechanica...@googlegroups.com
Tried looking at LogCompilation output with jitwatch? It's been helpful for me in the past to understand why something wouldn't get jitted.

Todd

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shevek

unread,
Feb 5, 2019, 12:13:23 AM2/5/19
to mechanica...@googlegroups.com
This isn't a JIT issue. According to perf-java-flames, all my code DID
get jitted. The overhead is entirely calls to this mystery
resolve_static_call function, so it looks like a static method lookup
issue in the JVM. The shape of the stack profile makes it look as if
something is recursive, too.
> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to mechanical-symp...@googlegroups.com
> <mailto:mechanical-symp...@googlegroups.com>.

Todd Lipcon

unread,
Feb 5, 2019, 1:54:03 AM2/5/19
to mechanica...@googlegroups.com
On Mon, Feb 4, 2019 at 9:13 PM Shevek <goo...@anarres.org> wrote:
This isn't a JIT issue. According to perf-java-flames, all my code DID
get jitted. The overhead is entirely calls to this mystery
resolve_static_call function, so it looks like a static method lookup
issue in the JVM. The shape of the stack profile makes it look as if
something is recursive, too.

Are you sure? From the code it certainly looks like 'resolve_static_call' is part of the interpreter code path.

-Todd
 
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Shevek

unread,
Feb 6, 2019, 5:54:29 PM2/6/19
to mechanica...@googlegroups.com, Todd Lipcon
Hmm... "Justify/explain your beliefs." Fair, I'm game:

I think I'm sure that my methods are both JIT'd because perf-java-flames
shows them both in green, and I think that's derived from some prefix on
the symbol name where the JVM (via perf) says it's JIT'd. If it weren't
JIT'd, I'd see it listed in red as "Interpreter".

I waited for the system to warm up before grabbing the flame-chart, and
on one run, I was too early, so I actually saw one symbol shift from
interpreted to JIT. I have not used jitwatch, but the signs I am looking
at seem reliable. The interpreter JIT'd the inner method first, then the
outer one, as expected based on call frequency.

I've never seen this resolve_static_call method show up before in any of
my profiling of Java code.

From that PoV, it looks as if the interpreter is doing something odd
with method call between the two methods, despite that it should just be
a static linkage. Like it JIT'd both methods, but didn't JIT the call,
or it's using some interpreter-path code in the call? So is there a
linkage limit based on #methods in the class, or something? Because
making the methods non-static ALSO moved them into a new class which has
only 2 methods... ok, new experiment...

For what it's worth, I seem to be really struggling with JDK 1.8.1b191
performance on the new Xeon hardware for other reasons, too. I'm seeing
perf saying pthread_cond_wait -> native_write_msr is taking 50% of
runtime, and not even sure where to start with that except limit the
life of any JVM to 6 hours and restart it. I kind of want to blame a
Kernel / PMU change but it only affects the JVM.

Caveat: I don't do JVM internals, I'm mostly a JLS-layer muggle.

S.

On 2/4/19 10:53 PM, Todd Lipcon wrote:
> On Mon, Feb 4, 2019 at 9:13 PM Shevek <goo...@anarres.org
> <mailto:goo...@anarres.org>> wrote:
>
> This isn't a JIT issue. According to perf-java-flames, all my code DID
> get jitted. The overhead is entirely calls to this mystery
> resolve_static_call function, so it looks like a static method lookup
> issue in the JVM. The shape of the stack profile makes it look as if
> something is recursive, too.
>
>
> Are you sure? From the code it certainly looks like
> 'resolve_static_call' is part of the interpreter code path.
>
> -Todd
>
>
> On 2/4/19 8:01 PM, Todd Lipcon wrote:
> > Tried looking at LogCompilation output with jitwatch? It's been
> helpful
> > for me in the past to understand why something wouldn't get jitted.
> >
> > Todd
> >
> > On Mon, Feb 4, 2019, 7:54 PM Shevek <goo...@anarres.org
> <mailto:goo...@anarres.org>
> >     <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com
> <mailto:mechanical-sympathy%252Buns...@googlegroups.com>>.
> >     For more options, visit https://groups.google.com/d/optout.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "mechanical-sympathy" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to mechanical-symp...@googlegroups.com
> <mailto:mechanical-sympathy%2Bunsu...@googlegroups.com>
> > <mailto:mechanical-symp...@googlegroups.com

Nitsan Wakart

unread,
Feb 7, 2019, 2:32:20 AM2/7/19
to mechanica...@googlegroups.com
Profile it with Solaris Studio(yes you can!), this should give you insight into the assembly level as well.
The code may have been compiled by c1, or c2, but you are correct that if you see a symbol in perf-map-agent then it is definitely compiled.
Can you reproduce the issue in a minimal JMH benchmark?
Is the system slow or is it just a strange profile?
> To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Andrei Pangin

unread,
Feb 7, 2019, 8:36:06 PM2/7/19
to mechanical-sympathy
Looks like you run long computation inside the static initializer <clinit>, right?
The problem is that HotSpot cannot fully optimize static calls when the holder class is not completely initialized. Moving computation to a separate class is the right way to deal with such cases.

The situation got even worse in recent JDK updates. There was a zero day bug https://bugs.openjdk.java.net/browse/JDK-8215634 that could result in invocation of a static method of uninitialized class in violation of JVMS. The bug was fixed in JDK 8u201 and JDK 11.0.2. However, the fix is terrible from performance perspective: now the invocation of a static method of uninitialized class goes through the slow path, in particular, through resolve_static_call() JVM runtime function.

So, the recommendation is simple: move heavy logic out of a class being initialized.

- Andrei 

четверг, 7 февраля 2019 г., 1:54:29 UTC+3 пользователь Shevek написал:

Francesco Nigro

unread,
Feb 8, 2019, 2:58:39 AM2/8/19
to mechanica...@googlegroups.com
I suspect that the build phase of an app using Graal AOT could suffer the same limitation, but it is correct that static initializers *shouldn't* be used for heavyweight computations given that are ensured to be called serially...

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages