Safepoints, memcpy, and Unsafe.copyMemory

Kevin Burton

unread,

Jul 16, 2013, 3:45:34 PM7/16/13

to mechanica...@googlegroups.com

So I'm reading the JVM documentation dealing with copying memory and Unsafe when I came across this:

// This number limits the number of bytes to copy per call to Unsafe's

// copyMemory method. A limit is imposed to allow for safepoint polling

// during a large copy

... in Bits.java

Basically, it looks like the copyArray methods use Unsafe.copyMemory which I assume internally is using memcpy.

And I imagine the reason they are paging over 1MB chunks is so that when the GC kicks in they can lower the latency for that method to return.

If you're copying 100MB you're going to have a GC hiccup if you run out of memory. Better to just pause during a large copy since you're doing 1MB at a time.

Seems reasonable of course but more for me to implement and certainly ANOTHER argument for not using Unsafe if you can avoid it.

I'm just sick of having to dive into the GC internals to figure out what brain damage I have to route around and this way I can just do my own memory management.

Gil Tene

unread,

Jul 18, 2013, 5:42:06 AM7/18/13

to mechanica...@googlegroups.com

Yeah, as I noted in some other postings, Time To Safepoint (TTSP) can be a silent killer. Largely because it goes unreported in GC pause output for most JVMs (it is usually only reported in '-XX:+PrintGCApplicationConcurrentTime' in Oracle HotSpot and OpenJDK), and also because its a Russian roulette game in which you mostly live, making it hard to test for.

Unlike with regular JNI calls (in which your code runs in a safepoint), you should assume that any Unsafe call you make can effectively become a JVM-wide blocking operation (for the duration of the unsafe call). Most of the time it won't, but if the JVM happens to be trying to get o a safepoint (for GC, depot, deadlock detection, or whatever) while you are in an Unsafe call that does not actually cross the JNI boundary, all other threads in the JVM will end up pausing at least until your call completes. Not a real issue for short calls (like atomic ops), but a very real issue for things like memory copies.

Kirk Pepperdine

unread,

Jul 18, 2013, 7:49:09 AM7/18/13

to mechanica...@googlegroups.com

Hi Gil,

It's a silent killer in a number of ways aside from it not being reported unless you report on application stopped time. Frequent safe pointing can be a hidden source of scheduling pressure. It doesn't show up as contention as this isn't a lock, it's page fault. In one case I was able to tune an app that was only capable of spinning the CPU up to ~75% to reach 99% simply by working on reducing the number of safe pointing operations. Things like biased locking can be very very disruptive because of the amount of safe pointing involved when using it in Oracle's current implementation. But my guess is you already know this. ;-)

-- Kirk

> --
> You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Jason Koch

unread,

Jul 18, 2013, 7:57:13 AM7/18/13

to mechanica...@googlegroups.com

Hi Kirk / Gil / anyone with expertise - I'd love to hear a bit more about this. I understand some of it will be IP but if you can share anything that might help us to understand in a bit more detail some of the triggers for TTSP I'd be very appreciative

It seems like the implication you make is that not only is JNI a slow handover but it is also likely to trigger a serialisation between JNI calls due to safepointing (or did I read too much into that ...). If I have a series of long-running JNI calls, they would effectively hold the entire VM up -- this does not seem to be my experience with JNI .. although to be fair I've never examined for this behaviour.

Thanks

Jason

Jean-Philippe BEMPEL

unread,

Jul 18, 2013, 9:58:15 AM7/18/13

to mechanica...@googlegroups.com

+1 on BiasedLocking.

Depending on the code pattern, It can be very annoying when (bulk)revoking (safepoint) kicks in.

Martin Thompson

unread,

Jul 18, 2013, 10:00:47 AM7/18/13

to mechanica...@googlegroups.com

I've generally found biased locking to be very useful when you are absolutely sure you never have contention on locks. If you have any contention at all then best to disable it to improve the latency tail.

> To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Peter Lawrey

unread,

Jul 18, 2013, 11:20:55 AM7/18/13

to mechanica...@googlegroups.com

I agree that synchronized work best for low levels of contention. Unfair locks works better for moderate contention and fair locks work best for significant contention. Ideally you recode your application if you see significant contention.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,

Jul 18, 2013, 11:23:21 AM7/18/13

to <mechanical-sympathy@googlegroups.com>, mechanica...@googlegroups.com

I.e. biased locking is very useful on locks that you shouldn't exist ;-).

It is also "very useful" for artificially improving SPECjbb results. There is a "good" a reason biased locking doesn't kick in until a few minutes into the JVM uptime. Its called SPECjbb. In SPECjbb, allowing basing earlier would mean that millions of thin locked objects created when initializing the benchmark warehouses would then need to be de-biased one-by-one during he timing run, bringing benchmark results down to practically nothing. But if you hold back long enough, then biasing artificially buys you significant (as in 5+%) in the benchmark. Good news if your only goal is a better number on this specific benchmark. Bad news if you have a real app with a producer/consumer relationship.

In Zing, we've completely backed off from using Biased Locking, and instead use per-thread safe pointing and what we call "owner inflation" to eliminate one of the two CAS operations used in hot/uncontended synchronization (a monitor enter and subsequent monitor exit pair associated with synchronized blocks or methods). Biased Locking was able to eliminate both CASs, but the heavy disruption cost (even with per-thread safepointing) in all these real world program phase change cases just wasn't worth it.

We may revisit in the future, but our experience shows that everyone (outside of some batch and u-benchmark) apps tends to avoid Biased Locking like the plague.

Sent from my iPad

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/f3g8pry-o1A/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.

Peter Lawrey

unread,

Jul 18, 2013, 11:50:28 AM7/18/13

to mechanica...@googlegroups.com

On 18 July 2013 16:23, Gil Tene <g...@azulsystems.com> wrote:

I.e. biased locking is very useful on locks that you shouldn't exist ;-).

Biased locking works very well on code which would be faster if it were single threaded. e.g. lots of simple micro-benchmarks suffer from this.

Gil Tene

unread,

Jul 18, 2013, 12:13:40 PM7/18/13

to <mechanical-sympathy@googlegroups.com>, mechanica...@googlegroups.com

Jason, JNI doesn't have this problem. Unsafe does. JNI carries a per call cost in both directions, but it does not serialize global safepoints.

The JVM serializes and sees a high TTSP only with threads that do not cross a safepoint opportunity (where the JVM can have the thread stall at a safe point) for a prolonged period of time. JNI doesn't have this problem, as each JNI call is "one big safepoint opportunity".

When I said "with regular JNI calls (in which your code runs in a safepoint)" below, I was referring to the fact that the entire JNI code execution is a safepoint from the thread's perspective, meaning that the JVM can safely look at the thread's stack machine state anywhere during the JNI execution. In fact, JNI keeps freely executing even during a global JVM safepoint (you just cant execute past it). In most JVMs, entering JNI releases the thread's "JVM lock", allowing the JVM to grab it at will and preventing the thread from proceeding out of the JNI call (by either returning to the calling java code or by calling into a JNI C API call that interacts with heap state).

In contrast to JNI code, normal/regular Java code and in other runtime-but-not-JNI code threads hold onto their JVM lock and don't let it go until asked to do so using some sort of "please come to a safepoint" request. When it notices the request (which only happens as it crosses a safepoint opportunity, it hands its JVM lock to the JVM, waiting to be released.

It's code that goes for a long period of time without crossing a safepoint opportunity (or already being at as us the case with JNI and pretty much all blocking calls) that is problematic. That's where long TTSPs come from. There are plenty of examples of normal Java things that can "accidentally" cause high TTSPs on JVMs that don't specifically work to avoid it. Classic examples are array copies and large object allocations.

You can see an example program I posted in a separate thread ( https://groups.google.com/forum/m/#!topic/mechanical-sympathy/vO7oq9aiG4Y ) that demonstrates an arbitrarily long TTSP on JVMs that don't spend the time to avoid this effect.

Many long TTSP in most JVMs are "the JVMs fault", while in JVMs that really care about latency consistency a lot of engineering tends to be invested in under the hood to minimize those long TTSP paths.

In Zing, we have a built-in TTSP profiler that lets us hunt down long TTSP paths, and both Azul and our latency sensitive customers make frequent use if it to work out TTSP kinks. Using this profiling, the Zing JVM has had years of TTSP reduction work done to reduce and minimize these paths, but there are examples where multi-millisecond TTSP is a result of user code semantics that is mot "the JVMs fault", and that you can affect (in both good and bad ways). A classic example is memory access into a mapped file whose contents is not locked in memory. Such access can stall the accessing thread (not at a safepoint) for many milliseconds as the buffer is brought into memory. Our customers often find this sort of thing with our TTSP profiler, without waiting for the Russian roulette to roll around to the unlucky slot in production.

Sent from my iPad

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/f3g8pry-o1A/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.

Kevin Burton

unread,

Jul 18, 2013, 4:56:45 PM7/18/13

to mechanica...@googlegroups.com

Another issue here isn't just JNI, JNA, or Unsafe but flat out calling system calls with non-deterministic runtime.

fsync() is one of them... it could take a somewhat long time for that to return.

Michael Barker

unread,

Jul 18, 2013, 5:08:26 PM7/18/13

to mechanica...@googlegroups.com

But it won't block the whole JVM if the GC is waiting for the threads
to arrive at a safe point, only the currently running thread. I don't
know this for sure, but I'm guessing that syscalls that potentially
block or take a really long time (read/write/poll/select/fsync) are
one of the reasons why you really really want the GC to be able to run
while a thread is executing native code and probably drove that
element of the JVM design.

Mike.

Gil Tene

unread,

Jul 18, 2013, 5:10:13 PM7/18/13

to

Yes. fsync() can take a long time, but it won't stall the JVM, and won't affect other application threads (as long as what they are doing is not stalled by the actual file system sync). The reason for this is that the calling thread is at a safepoint when it makes the blocking fsync() call, allowing the JVM to get in and out of global safepoints at will.

Pretty much all blocking calls in the JVM are done like this, so TTSP is not an issue for blocking calls. It's an issue for long running work in the application code or in the runtime (but not in JNI).

-- Gil.

Kirk Pepperdine

unread,

Jul 19, 2013, 4:47:26 AM7/19/13

to mechanica...@googlegroups.com

On 2013-07-18, at 11:09 PM, Gil Tene <g...@azulsystems.com> wrote:

Yes. sync() can take a long time, but it won't stall the JVM, and won't affect other application threads (as long as what they are doing is not stalled by the actual file system sync). The reason for this is that the calling thread is at a safepoint when it makes the blocking fsync() call, allowing the JVM to get in and out of global safepoints at will.

Pretty much all blocking calls in the JVM are done like this, so TTSP is not an issue for blocking calls. It's an issue for long running work in the application code or in the runtime (but not in JNI).

Well, long running work is handled quite well.. what isn't handled well is code in tight loops. There you will not have the thread at a safe point and that will stall GC... which is unfortunate because of how loops can be optimized.

-- Kirk

-- Gil.

On Thursday, July 18, 2013 1:56:45 PM UTC-7, Kevin Burton wrote:
Another issue here isn't just JNI, JNA, or Unsafe but flat out calling system calls with non-deterministic runtime.

fsync() is one of them... it could take a somewhat long time for that to return.

Vladimir Rodionov

unread,

Jul 23, 2013, 10:53:12 PM7/23/13

to mechanica...@googlegroups.com

I have never seen any real issue with Unsafe.copyMemory (except the fact that it is not supported in some OpenJDK versions). It seems that all Unsafe is rock solid. But we are still in development-testing phase of the product.

Custom All Java Off Heap Memory Management is built entirely on top of Unsafe. Tested up to 240GB with 20M alloc/free per sec during 24 hours.

--best

Kevin Burton

unread,

Jul 23, 2013, 11:11:25 PM7/23/13

to mechanica...@googlegroups.com

I got sucked into writing my own implementation which its really JUST Unsafe… but I'm trying to avoid the lock in Bits since I do lots of small allocations.

What's interesting is that my benchmarks show that Unsafe is *slightly* slower than a LITTLE_ENDIAN DirectByteBuffer. I'm not sure why… I'm going to work on some code to get the benchmark smaller to see what's up… it's not AMAZINGLY slower… maybe like 5%. I just expected Unsafe to be 20% or so faster.

Another benefit is that you can work with >2GB mmap files or memory slabs. And in my use case that's another perk!

Michael Barker

unread,

Jul 24, 2013, 12:10:22 AM7/24/13

to mechanica...@googlegroups.com

You may want to compare the alignment of the DirectByteBuffer and the
memory allocated by Unsafe.

Mike.

Kevin Burton

unread,

Jul 24, 2013, 12:45:00 AM7/24/13

to mechanica...@googlegroups.com

I think I'm going to verify this just to be sure.

But Unsafe's documentation says that it's aligned.

Here's a good deep dive into this topic btw:

http://psy-lob-saw.blogspot.com/2013/01/direct-memory-alignment-in-java.html

> email to mechanical-sympathy+unsub...@googlegroups.com.

Michael Barker

unread,

Jul 24, 2013, 12:49:58 AM7/24/13

to mechanica...@googlegroups.com

It's word aligned but not cache aligned, if some of your operations
are straddling cache-lines you could be paying some extra cost there.

Mike.

>> > email to mechanical-symp...@googlegroups.com.

>> > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> >
>

> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to mechanical-symp...@googlegroups.com.

Kevin Burton

unread,

Jul 24, 2013, 11:51:24 PM7/24/13

to mechanica...@googlegroups.com

Ah... Actually yes. Diving into it DirectByteBuffer does align some cache alignment.

Here's the code in question from JDK 1.7:

https://gist.github.com/burtonator/6076790

Very interesting!

>> > email to mechanical-sympathy+unsub...@googlegroups.com.

>> > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to mechanical-sympathy+unsub...@googlegroups.com.

Kevin Burton

unread,

Jul 24, 2013, 11:57:21 PM7/24/13

to mechanica...@googlegroups.com

Actually... there's a bug in the JVM here.

They allocate capacity + pageSize but only 'reserve' capacity.

This means that one MAX_DIRECT_MEMORY isn't yielded correctly and you could actually allocated MORE direct memory than allowed.

This can have all sorts of BAD consequences including hitting the OOM killer.

I'm really surprised by the sever LACK of quality in a lot of this code. I think a lot of it doesn't see the light of day and when you dive into it you really start to see lots of breakage.

I've found 2-3 pretty severe problems in this code.

Makes me think I should look at the source of Unsafe too.

Peter Lawrey

unread,

Jul 25, 2013, 3:47:34 AM7/25/13

to mechanica...@googlegroups.com

> This can have all sorts of BAD consequences including hitting the OOM killer.

You would have to be pretty close to the OOM killer's limit. I suspect the assumption is that you don't creates lots of these and you are no where near your process limits.

BTW If you can see better ways of doing this, you really should submit RFEs. ;)

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Roman Leventov

unread,

Feb 21, 2018, 7:23:52 AM2/21/18

to mechanical-sympathy

Has this problem been fixed in OpenJDK?

https://bugs.openjdk.java.net/browse/JDK-8149596 description says that "JDK-8141491 moves the copy functionality from libjava.so/Bits.c into the VM and Unsafe. The wrapper methods are no longer needed and the nio code should be updated to call them directly instead." However I cannot find any code that looks like safepoint poll insertion or copy chunking neither in Unsafe.java (http://hg.openjdk.java.net/jdk-updates/jdk9u/jdk/file/d54486c189e5/src/java.base/share/classes/jdk/internal/misc/Unsafe.java), unsafe.cpp (http://hg.openjdk.java.net/jdk-updates/jdk9u/hotspot/file/22d7a88dbe78/src/share/vm/prims/unsafe.cpp), nor copy.cpp (http://hg.openjdk.java.net/jdk-updates/jdk9u/hotspot/file/22d7a88dbe78/src/share/vm/utilities/copy.cpp).

>> > email to mechanical-sympathy+unsubscribe...@googlegroups.com.

>> > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to mechanical-sympathy+unsubscribe...@googlegroups.com.

> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Reply all

Reply to author

Forward