sun.misc.Unsafe.getAndAddInt implementation, question.

John Hening

unread,

Oct 14, 2017, 9:58:17 AM10/14/17

to mechanical-sympathy

Hello

it is an implementation from sun.misc.Unsafe.

public final int getAndAddInt(Object ptr, long offset, int value) {
 int curr;
 do {
    curr = this.getIntVolatile(ptr, offset); (1)
 } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)

 return curr;}

Why there is Unsafe.getIntVolatile()

 called instead of Unsafe.getInt()

 here?

I am basically familiar with memory models, memory barriers etc., but, perhaps I don't see any something important.

getIntVolatile means here: ensure the order of execution: (1) -> (2)

It looks something like:

curr = read(); 
acquire();
CAS operation

Obviously, acquire() depends on CPU, for example on x86 it is empty, on ARM it is a memory barrier, etc.

My question/misunderstanding:

For my eye the order is ensured by data dependency between read of (ptr + offset) and CAS operation on it. So, I don't see a reason to worry about memory (re)ordering.

Gil Tene

unread,

Oct 14, 2017, 7:34:34 PM10/14/17

to mechanica...@googlegroups.com

A simple answer would be that the field is treated by the method as a volatile, and the code is simply staying consistent with that notion. Is an optimization possible here? Possibly. Probably. But does it matter? No. The source code involved is not performance critical, and is not worth optimizing. The interpreter may be running this logic, but no hot path would be executing the actual logic in this code...

Why? Because the java code you see there is NOT what the hot code would be doing on (most) JVMs. Specifically, optimizing JITs can and will identify and intrinsify the method, replacing it's body with code that does whatever they want it to do. They don't have to perform any of the actual the logic in the method, as long as they make sure the method's performs it's intended (contracted) function. and that contracted functionality is to perform a getAndAddInt on a field, treating it logically as a volatile.

For example, on x86 there is support for atomic add via the XADD instruction. Using XADD for this method's functionality has multiple advantages over doing the as-coded CAS loop. And most optimizing JITs will [transparently] use an XADD in place of a CAS in this case and get rid of the loop altogether.

John Hening

unread,

Oct 15, 2017, 5:12:30 AM10/15/17

to mechanical-sympathy

Gil Tene, thanks you very much.

Ok, so does it mean that Unsafe.getIntVolatile() could be be replaced with Unsafe.getInt()?

Gil Tene

unread,

Oct 16, 2017, 6:13:07 AM10/16/17

to mechanica...@googlegroups.com

Ok. So the question below (ignoring other optimizations in the JVM that are specific to this method) is "If I were doing this myself in some other method, would this logic be valid if Unsafe.getIntVolatile() could be be replaced with Unsafe.getInt()?"

The answer IMO is "no".

The issue here is that unlike e.g. AtomicInteger.compareAndSet(), which is explicitly specified to include the behavior of a volatile read on the field involved, Unsafe.compareAndSwapInt() does not make any claims about exhibiting volatile read semantics. As a result, if you replace Unsafe.getIntVolatile() with Unsafe.getInt(), the resulting code:

public final int getAndAddInt(Object ptr, long offset, int value) {
 int curr;
 do {


    curr = this.getInt(ptr, offset);


 } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value));


 return curr;
}

Can be validly transformed by the optimizer to:

public final int getAndAddInt(Object ptr, long offset, int value) {


 int curr = this.getInt(ptr, offset);
 do {  
 } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value));
 return curr;
}

Because:

(a) The optimizer can prove that if the compareAndSwapInt ever actually wrote to the field, the method would return and curr wouldn't be read again.

(b) Since the read of curr is not volatile, and the read in Unsafe.compareAndSwapInt() is not required to act like a volatile read, all the reads of curr can be reordered with all the reads in the compareAndSwapInt() operations, which means that they can be folded together and hoisted out of the loop.

If this valid optimization happened, the resulting code would get stuck in an infinite loop if another thread modified the field between the read of curr and the compareAndSwapInt, and that is obviously not the intended behavior of getAndAddInt()...

John Hening

unread,

Oct 16, 2017, 10:30:13 AM10/16/17

to mechanica...@googlegroups.com

Thanks Gil Tene!

You obviously right. The read is not volatile so the compiler is allowed to reorder it. Moreover, the read is not volatile, so the compiler assumes that noone changes the source of getInt(). So, it can hoist out curr.

The last question (sorry for being inquisitive) is:

Let's assume that the compiler generated bytecode equivalent to

public final int getAndAddInt(Object ptr, long offset, int value) {

 int curr;
 do {

curr = this.getInt(ptr, offset); (1)

 } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2)
 return curr;
}

so it wasn't optimized.

Now, it seems to work correctly. But, note that CPU can make some reordering. We are lucky because the CPU cannot reorder here: there is a data dependency: (1) -> (2). So, on every sensible (with data dependency) CPU it works, yes?

W dniu poniedziałek, 16 października 2017 12:13:07 UTC+2 użytkownik Gil Tene napisał:

Ok. So the question below (ignoring other optimizations in the JVM that are specific to this method) is "If I were doing this myself in some other method, would this logic be valid if Unsafe.getIntVolatile() could be be replaced with Unsafe.getInt()?"

The answer IMO is "no".

The issue here is that unlike e.g. AtomicInteger.compareAndSet(), which is explicitly specified to include the behavior of a volatile read on the field involved, Unsafe.compareAndSwapInt() does not make any claims about exhibiting volatile read semantics. As a result, if you replace Unsafe.getIntVolatile() with Unsafe.getInt(), the resulting code:

public final int getAndAddInt(Object ptr, long offset, int value) { int curr; do {

curr = this.getInt(ptr, offset); (1)

} while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2) return curr; }

Can be validly transformed by the optimizer to:

public final int getAndAddInt(Object ptr, long offset, int value) {

int curr = this.getInt(ptr, offset); (1) do { } while(!this.compareAndSwapInt(ptr, offset, curr, curr + value)); (2) return curr; }

Because:

(a) The optimizer can prove that if the compareAndSwapInt ever actually wrote to the field, the method would return and curr wouldn't be read again.

(b) Since the read of curr is not volatile, and the read in Unsafe.compareAndSwapInt() is not required to act like a volatile read, all the reads of curr can be reordered with the all the reads in the compareAndSwapInt() calls, which means that they can be folded together and hoisted out of the loop.

If this valid optimization happened, the resulting code would get stuck in an infinite loop if another thread modified the field between the read of curr and the compareAndSwapInt call, and that is obviously not the intended behavior of getAndAddInt()...

Gil Tene

unread,

Oct 16, 2017, 4:47:07 PM10/16/17

to mechanica...@googlegroups.com

The bytecode doesn't matter. It's not the javac compiler that will be doing the optimizations you should be worried about. It's the JIT compilers in the JVM. The javac-generated bytecode is only executed by the interpreter. The bytecode is eventually transformed to machine code by the JIT compiler, during which it will undergo aggressive optimization.

The "CPU" you should worry about and model in your mind is not x86, SPARC, or ARM. It's the JVM's execution engine and the JIT-generated machine code that does most of the actual execution. And that "CPU" will reorder the code more aggressively than any HW CPU ever would. The JIT's optimizing transformations include arbitrary and massive re-ordereing, reshaping, folding-together, and completely eliminating big parts of your apparent bytecode instructions. And the JIT will do all those as long as it can prove that the transformations are allowed.

John Hening

unread,

Oct 17, 2017, 10:22:21 AM10/17/17

to mechanical-sympathy

thanks :)

W dniu poniedziałek, 16 października 2017 22:47:07 UTC+2 użytkownik Gil Tene napisał:

The bytecode doesn't matter. It's not the javac compiler that will be doing the optimizations you should be worried about. It's the JIT compilers in the JVM. The javac-generated bytecode is only executed by the interpreter. The bytecode is eventually transformed to machine code by the JIT compiler, during which it will undergo aggressive optimization.

The "CPU" you should worry about and model in your mind is not x86, SPARC, or ARM. It's the JVM's execution engine and the JIT-generated machine code that does most of the actual execution. And that "CPU" will reorder the code more aggressively than any HW CPU ever would. The JIT's optimizing transformations include arbitrary and massive re-ordereing, reshaping, folding-together, and completely eliminating big parts off your apparent bytecode instructions. And the JIT will do all those as long as it can prove that the transformations are allowed.

Reply all

Reply to author

Forward