Java unsafe/bytebuffer flyweight disadvantages

Rajiv Kurian

unread,

Mar 25, 2015, 5:02:49 PM3/25/15

to mechanica...@googlegroups.com

I was looking at this Java performance panel by Gil, Todd, Mike and Tal - http://www.infoq.com/presentations/panel-java-performance. One of the things that struck me was Gil talking about flyweights and how they are more inefficient than just plain object field accesses.

I use flyweights usually like this.

class FooFlyweight {

private static final int INT_FIELD_OFFSET = 0;

private static final int LONG_FIELD_OFFSET = INT_FIELD_OFFSET + INT_SIZE; // 4

private static final int TOTAL_SIZE_PER_FOO = LONG_FIELD_OFFSET + LONG_SIZE; // 12

private long basePtr;

private int index

private int currentObjectStartPointer;

private resetPtr(long ptr) {

basePtr = ptr;

}

private resetIndex(int index) {

this.index = index;

this. currentObjectStartPointer = basePtr + TOTAL_SIZE_PER_FOO * index;

}

int getInfField() {

return Unsafe.getInt(currentObjectStartPointer + INT_FIELD_OFFSET);

}

long getLongField() {

return Unsafe.getLong(currentObjectStartPointer + LONG_FIELD_OFFSET);

}

And I use it to iterate through a chunk of off-heap memory and treat this like an array of structs.

Why is it that getInfField() and getLongField() are less efficient than a simple object field access. Also what is the difference between this and the equivalent code where we have a an array of structs like Struct Foo { int32_t a; int64_t b;};?

Reynold Xin

unread,

Mar 25, 2015, 5:04:34 PM3/25/15

to mechanica...@googlegroups.com

Address calculation can be slower due to JIT not being able to optimize for those instructions.

One discussion in the past that touched upon this: https://groups.google.com/forum/#!topic/mechanical-sympathy/k0qd7dLHFQE

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rajiv Kurian

unread,

Mar 25, 2015, 8:15:32 PM3/25/15

to mechanica...@googlegroups.com

Nice, that led to me Nitsan's post too which was pretty informative.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Message has been deleted

Nitsan Wakart

unread,

Mar 29, 2015, 8:06:24 AM3/29/15

to mechanica...@googlegroups.com

The advantages of the Fly-Weight 'object'/struct:

- On new-ish Intel (OS/Old chips caveat) can choose to densely pack fields (i.e. pay no attention to field type alignment) which results in improved locality at the price of potential 'tearing' (if field is straddling the cache line boundary). Un-aligned access within the cache line is mostly safe and performant. It's also much easier to tailor your own layout and alignment, so you can choose fields and 'object' alignment such that unalinged access and straddling is avoided.

- The Fly-Weight 'object' which is part of an array of such 'objects' enjoys a better memory access pattern compared to the array of references. This is to be addressed by the ObjectLayout initiative.

One down side is that your code now allot more confusing to the compiler and this can lead to major optimizations not coming your way.

Here's an extreme example:

E.g (bytes = byte[1<<15], address = Unsafe.allocateMemory(1<<15))

for(int i=0;i<a.length;i++) bytes[i] = b; --> gets replaced with a specialized memset implementation(SIMD galore!!!)

Roughly 750 ns/op

vs.

for(int i=0;i<length;i++) Unsafe.putInteger(address+i,b); --> gets nuffin, we don't serve your kind here!

Roughly 15µs

Alexander Turner

unread,

Mar 29, 2015, 1:07:47 PM3/29/15

to mechanica...@googlegroups.com, nit...@yahoo.com

Am I reading that wrong, should the second loop be i+=4 ? Otherwise you are looking at 4 times the operation time. Maybe in that case you will see the CPU do a hardware looping optimisation. Or - as I say - did I read it incorrectly?

Alexander Turner

unread,

Mar 29, 2015, 1:11:05 PM3/29/15

to mechanica...@googlegroups.com

They are not less efficient at the machine code level. However, at the optimisation level they are less efficient. There might be many reasons for this but one is the opportunity to inline and the elide operations on object fields. Escape analysis is another area. BTW GIl speaks with a slightly different point of view as Zing does not suffer from GC burden and so the advantages of off heap are much less for him. Maybe one day the same will be said for all JVMs (I hope).

Nitsan Wakart

unread,

Mar 30, 2015, 3:55:10 AM3/30/15

to mechanica...@googlegroups.com

You are right to correct the typo, this is a case of not using copy and paste and my fingers exercising some poetic licence. The benchmarked code, and therefore the numbers quoted where for this code:

@Benchmark

public void manualBytes() {

for(int i=0;i<bytes.length;i++) {

bytes[i] = b;

}

@Benchmark

public void unsafeBytes() {

for(int i=0;i<length;i++) {

UnsafeAccess.UNSAFE.putByte(address+i, b);

}

On Sunday, March 29, 2015 8:21 PM, Alexander Turner <nerdsc...@gmail.com> wrote:

They are not less efficient at the machine code level. However, at the optimisation level they are less efficient. There might be many reasons for this but one is the opportunity to inline and the elide operations on object fields. Escape analysis is another area. BTW GIl speaks with a slightly different point of view as Zing does not suffer from GC burden and so the advantages of off heap are much less for him. Maybe one day the same will be said for all JVMs (I hope).

On Wednesday, 25 March 2015 21:02:49 UTC, Rajiv Kurian wrote:

I was looking at this Java performance panel by Gil, Todd, Mike and Tal - http://www.infoq.com/ presentations/panel-java- performance. One of the things that struck me was Gil talking about flyweights and how they are more inefficient than just plain object field accesses.

I use flyweights usually like this.

class FooFlyweight {
private static final int INT_FIELD_OFFSET = 0;
private static final int LONG_FIELD_OFFSET = INT_FIELD_OFFSET + INT_SIZE; // 4
private static final int TOTAL_SIZE_PER_FOO = LONG_FIELD_OFFSET + LONG_SIZE; // 12

private long basePtr;
private int index

private int currentObjectStartPointer;

private resetPtr(long ptr) {
basePtr = ptr;
}

private resetIndex(int index) {
this.index = index;
this. currentObjectStartPointer = basePtr + TOTAL_SIZE_PER_FOO * index;
}

int getInfField() {

return Unsafe.getInt( currentObjectStartPointer + INT_FIELD_OFFSET);
}

long getLongField() {
return Unsafe.getLong( currentObjectStartPointer + LONG_FIELD_OFFSET);

}

}

And I use it to iterate through a chunk of off-heap memory and treat this like an array of structs.

Why is it that getInfField() and getLongField() are less efficient than a simple object field access. Also what is the difference between this and the equivalent code where we have a an array of structs like Struct Foo { int32_t a; int64_t b;};?

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Rajiv Kurian

unread,

Mar 30, 2015, 5:14:56 PM3/30/15

to mechanica...@googlegroups.com

I'd imagine Unsafe.setMemory() using a memset implementation under the covers though.

Vitaly Davidovich

unread,

Mar 30, 2015, 5:36:12 PM3/30/15

to mechanica...@googlegroups.com

It looks like Unsafe.setMemory() only calls memset if it's allowed to fill the memory unaligned: http://hg.openjdk.java.net/jdk9/hs/hotspot/file/bab69a199d8f/src/share/vm/utilities/copy.cpp#l58. Otherwise, it does a hand-rolled fill (the memset call is what the last else branch ultimately calls into -- http://hg.openjdk.java.net/jdk9/hs/hotspot/file/bab69a199d8f/src/cpu/x86/vm/copy_x86.hpp#l65 for the x86/64 impl).

Also, it doesn't appear to be a JIT intrinsic, so there's going to be JNI overhead of calling this method.

On Mon, Mar 30, 2015 at 5:14 PM, Rajiv Kurian <geet...@gmail.com> wrote:

I'd imagine Unsafe.setMemory() using a memset implementation under the covers though.

Rajiv Kurian

unread,

Mar 30, 2015, 5:59:33 PM3/30/15

to mechanica...@googlegroups.com

Bummer.

Reply all

Reply to author

Forward