Hi,
On 11/21/2016 07:58 PM, Sergey Melnikov wrote:
> On Linux (I'm not sure about windows, but I think windows does it the
> same way), kernel maps thread-local segment for current thread via FS
> segment register. So, in C/C++ it's possible to get value from TLS
> with only 1 (!) instruction:
> 10: 64 48 8b 04 25 00 00 mov %fs:0x0,%rax
This is JVM- and platform-specific, but Hotspot x86_64 does pull TLS in
register:
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/225b91f1b118/src/cpu/x86/vm/x86_64.ad
-- you could do the same for 32-bit and FS segment.
> So, what do you think, is it worth to have a lightweight TLS API
> (inspired by C/C++) in Java? Or may be it's possible to implement it
> in JVM right now?
Note that java.lang.ThreadLocal and native thread TLS are the beasts
from different worlds. Mapping one to another would require passing
through JDK<->VM boundary in several places.
An obvious idea would be reserving the indexed slot per each ThreadLocal
instance's value in native TLS, right? Then you can "just" poll
ThreadLocal.idx, and do mov 0x$idx(%r15), %dst on read -- voila! The
same would go if we ditch ThreadLocal and do straight TLS.{get|set}(int
idx, T val).
But then complications emerge: you basically want to store the Java
references in native TLS storage. Which means you need to make sure it
works nicely with GC: e.g. slots get recycled properly when ThreadLocal
objects die, GC detects the reachability via TLS slots (which probably
requires TLS to get scanned as part of rootset now? or some special case
in all GCs?), all barriers for stores and loads are in place, etc.
Then, you'd realize doing this from Java is complicated, because user
code has no business roaming around native TLS where some interesting
VM-specific things lie (e.g. GC/runtime flags, queues, etc). You might
probably poll VM about thread specifics, and what is available and what
is not. Coexistence would be interesting, because there is already one
heavy TLS user -- the JVM itself.
Not to mention that instantiating the ThreadLocal now has to have global
effects on all TLSes, because slots must match between the threads. (One
of those nice properties of thread-local Map<ThreadLocal, V> map is that
I can init a ThreadLocal from a single thread only, with no cost to
other threads).
After all that bi-directional thing is done, you'd need to prove this
works equally well across all other architectures OpenJDK supports ;)
This is to say the whole ordeal is not as easy as it might sound. "Just
do the intrinsics for them!" is not gonna cut it.
Thanks,
-Aleksey