Spin Loop Hint support in Java (draft JEP)

1,041 views
Skip to first unread message

Gil Tene

unread,
Oct 4, 2015, 1:40:34 PM10/4/15
to mechanica...@googlegroups.com
I had just posted a draft JEP focusing on adding a Spin Loop Hint capability to Java to the core-libs-dev and hotspot-dev OpenJDK lists to start discussion. It's a subject that is probably interesting to this group, so I'm including it in richer-text form here. Several of the people on this list have mentioned their wish for PAUSE instruction support in Java over the past couple of years, so we finally did something about it.

There are working prototype JDKs (see links from https://github.com/giltene/GilExamples/tree/master/SpinHintTest for Linux, Mac, and Windows prototype JDKs) that you an play with to see how/if this impacts your own loops, and as you can see from the WebRevs, the changes needed are fairly simple... Kudos to Yuri Gaevsky and Ivan Krylov from Azul for the actual hard work on creating the prototype implementations.

-- Gil.

JEP XYZ: Spin Loop Hint

(suggested content for some JEP fields):
AuthorsGil Tene
OwnerGil Tene
TypeFeature
StatusDraft
Componentcore-libs
ScopeJDK
Discussioncore dash libs dash dev at openjdk dot java dot net
EffortS
DurationS

Summary

Add an API that would allow Java code to hint that a spin loop is being executed.

Goals

Provide an API that would allow Java code to hint to the runtime that it is in a spin loop. The API would be a pure hint, and will carry no semantic behavior requirements (i.e. a no-op is a valid implementation). Allow the JVM to benefit from spin loop specific behaviors that may be useful on certain hardware platforms. Provide both a no-op implementation and an intrinsic implementation in the JDK, and demonstrate an execution benefit on at least one major hardware platform.

Non-Goals

It is NOT a goal to look at performance hints beyond spin loops. Other performance hints, such as prefetch hints, are outside the scope of this JEP.

Motivation

Some hardware platforms benefit from software indication that a spin loop is in progress. Some common execution benefits may be observed:

A) The reaction time of a spin loop may be improved when a spin hint is used due to various factors, reducing thread-to-thread latencies in spinning wait situations.

and

B) The power consumed by the core or hardware thread involved in the spin loop may be reduced, benefitting overall power consumption of a program, and possibly allowing other cores or hardware threads to execute at faster speeds within the same power consumption envelope. 

While long term spinning is often discouraged as a general user-mode programming practice, short term spinning prior to blocking is a common practice (both inside and outside of the JDK). Furthermore, as core-rich computing platforms are commonly available, many performance and/or latency sensitive applications use a pattern that dedicates a spinning thread to a latency critical function [1], and may involve long term spinning as well.  

As a practical example and use case, current x86 processors support a PAUSE instruction that can be used to indicate spinning behavior. Using a PAUSE instruction demonstrably reduces thread-to-thread round trips. Due to it's benefits and commonly recommended use, the x86 PAUSE instruction is commonly used in kernel spinlocks, in POSIX libraries that perform heuristic spins prior to blocking, and even by the JVM itself. However, due to the inability to hint that a Java loop is spinning, it's benefits are not available to regular Java code.

We include specific supporting evidence: In simple tests [2] performed on a E5-2697 v2, measuring the round trip latency behavior between two threads that communicate by spinning on a volatile field, round-trip latencies were demonstrably reduced by 18-20nsec across a wide percentile spectrum (from the 10%'ile to the 99.9%'ile). This reduction can represent an improvement as high as 35%-50% in best-case thread-to-thread communication latency. E.g. when two spinning threads execute on two hardware threads that share a physical CPU core and an L1 data cache. See example latency measurement results comparing the reaction latency of a spin loop that includes an intrinsified spinLoopHint() call [intrinsified as a PAUSE instruction] to the same loop executed without using a PAUSE instruction [3], along with the measurements of the it takes to perform an actual System.nantoTime() call to measure time.

Description

We propose to add a method to the JDK which would be hint that a spin loop is being performed. E.g. jdk.util.PerformanceHints.spinLoopHint(), which will hopefully evolve to a Java SE API, e.g. java.util.PerformanceHints.spinLoopHint(). The specific name space location, class name, and method name will be determined as part of development of this JEP.

An empty method would be a valid implementation of the spinLoopHint() method, but intrisic implementation is the obvious goal for hardware platforms that can benefit from it. We intend to produce an intrinsic x86 implementation for OpenJDK as part of developing this JEP. A prototype implementation already exists [4] [5] [6] [7] and results from initial testing show promise.

Alternatives

JNI can be used to spin loop with a spin-loop-hinting CPU instruction, but the JNI-boundary crossing overhead tends to be larger than the benefit provided by the instruction, at least where latency is concerned. 

We could attempt to have the JIT compilers deduce spin-loop situations and code and choose to automatically include a spin-loop-hinting CPU instructions with no Java code hints required. We expect that the complexity of automatically and reliably detecting spinning situations, coupled with questions about potential tradeoffs in using the hints on some platform to delay the availability of viable implementations significantly.

Testing

Testing of a "vanilla" no-op implementation will obviously be fairly simple. 

We believe that given the vey small footprint of this API, testing of an intrinsified x86 implementation in OpenJDK will also be straightforward. We expect testing to focus on confirming both the code generation correctness and latency benefits of using the spin loop hint with an intrinsic implementation.

Should this API be proposed as a Java SE API (e.g. for inclusion in the java.* namespace in a future Java SE 9 or Java SE 10), we expect to develop an associated TCK tests for the API for potential inclusion in the Java SE TCK. 

Risks and Assumptions

The "vanilla" no-op implementation is obviously fairly low risk. An intrinsic x86 implementation will involve modifications to multiple JVM components and as such they carry some risks, but no more than other simple intrinsics added to the JDK.


[4] HotSpot WebRevs for prototype implementation which intrinsifies org.performancehintsSpinHint.spinLoopHint() http://ivankrylov.github.io/spinloophint/webrev/
[5] JDK WebRevs for prototype intrinsifying implementation: http://ivankrylov.github.io/spinloophint/webrev.jdk/
[6] Build environment WebRevs for prototype intrinsifying implementation: http://ivankrylov.github.io/spinloophint/webrev.main/
[7] Link to a working Linux protoype OpenJDK9-based JDK (accepts optional -XX:+UseSpinLoopHintIntrinsic) https://www.dropbox.com/s/r2w1s1jykr2qs01/slh-openjdk-9-b70-bin-linux-x64.tar.gz?dl=0


Martin Thompson

unread,
Oct 5, 2015, 2:45:26 AM10/5/15
to mechanical-sympathy
Thanks Gil. I love how our little conversations get to bear fruit like this.

I'm often writing C and Java versions of the same algorithms side-by-side. This is one of the major advantages to C for concurrent algorithms. The reduction in latency by preventing speculative execution is great. What is really nice to see is how much better the server runs on the whole due to enabling the benefits of hyper threading and reduction in power usage so that turbo boost can work better. For me this widens the usage to general high throughput applications, such as real-time stream processing, and not just low-latency finance applications. With each generation of x86 processors we are seeing hyper threading improve and without the use of instructions like PAUSE in concurrent algorithms then we are less likely to enjoy the benefits.

Kirk Pepperdine

unread,
Oct 5, 2015, 4:38:21 AM10/5/15
to mechanica...@googlegroups.com
+1, lets hope it’s accepted.

Regards,
Kirk

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

signature.asc

Todd Montgomery

unread,
Oct 5, 2015, 10:27:25 AM10/5/15
to mechanical-sympathy
Thanks, Gil. As I have worked on the Aeron C++ API and played with the performance of it, I've seen that having access to pause has quite a big impact on system resources. I very much hope this is accepted! It is quite needed!

-- Todd L. Montgomery 

Jean-Philippe BEMPEL

unread,
Oct 5, 2015, 10:35:32 AM10/5/15
to mechanical-sympathy
Thanks a lot Gil to pushing this!
As other low latency fellows, we need this too, and it is very difficult to find a workaround for this.
It seems also very harmless. I do not think there is significant drawbacks introducing this, even if only few people will really benefit from this.

Vitaly Davidovich

unread,
Oct 5, 2015, 10:47:21 AM10/5/15
to mechanical-sympathy

Yes, this is a good proposal.  What I'd really like is better (read: faster) JNI/FFI so one can get access to whatever hardware features not exposed by JDK with less friction.

As for workarounds, this was briefly touched upon in this list here: https://groups.google.com/forum/m/#!topic/mechanical-sympathy/4StNyfMMn9o.  The tl;dr is you intentionally cause a branch mispredict to kill the speculation pipeline; unsatisfactory to say the least though.

sent from my phone

Georges Gomes

unread,
Oct 5, 2015, 6:37:26 PM10/5/15
to mechanica...@googlegroups.com
Thanks Gil,
This is great. Let us know the best way to back this initiative!

Christoph Engelbert

unread,
Oct 6, 2015, 8:28:45 AM10/6/15
to mechanical-sympathy
Thanks Gil, very nice addition!

+1

Kristoffer Sjögren

unread,
Oct 8, 2015, 7:26:23 PM10/8/15
to mechanical-sympathy
Awesome to see awesome people step up to drive Java and our community forward. This attitude brings more than just technical benefits. A big thank you and keep up the good work!


On Sunday, October 4, 2015 at 7:40:34 PM UTC+2, Gil Tene wrote:
Reply all
Reply to author
Forward
0 new messages