FlatBuffers, ByteBuffers, and escape analysis

415 views
Skip to first unread message

Todd Lipcon

unread,
Aug 7, 2018, 4:55:35 PM8/7/18
to mechanica...@googlegroups.com
Hey folks,

I'm working on reducing heap usage of a big server application that currently holds on to tens of millions of generated FlatBuffer instances in the old generation. Each such instance looks more or less like this:

    private static class FileDesc {
      private final ByteBuffer bb;
      int bbPos;

      FileDesc(ByteBuffer bb) {
        bbPos = bb.getShort(bb.position()) + bb.position();
        this.bb = bb;
      }

      public int getVal() {
        return bb.getInt(bbPos);
      }
    }

(I've simplified the code, but the important bit is the ByteBuffer member and the fact that it provides nice accessors which read data from various parts of the buffer)

Unfortunately, the heap usage of these buffers adds up quite a bit -- each ByteBuffer takes 56 bytes of heap, and each 'FileDesc' takes 32 bytes after padding. The underlying buffers themselves are typically on the order of 100 bytes, so it seems like almost 50% of the heap is being used by wrapper objects instead of the underlying data itself. Additionally, 2/3 of the object count are overhead, which I imagine contributes to GC scanning/marking time.

In practice, all of the ByteBuffers used by this app are simply ByteBuffer.wrap(byteArray). I was figuring that an easy improvement here would be to simply store the byte[] and whenever we need to access the contents of the FlatBuffer, use it as a flyweight:

  new FileDesc(ByteBuffer.wrap(byteArray)).getVal();

... and let the magic of Escape Analysis eliminate those allocations. Unfortunately, I've learned from this group that magic should be tested, so I wrote a JMH benchmark: https://gist.github.com/4b6ddf0febcc3620ccdf68e5f11c6c83 and found that the ByteBuffer.wrap allocation is not eliminated.

Has anyone faced this issue before? It seems like my only real option here is to modify the flatbuffer code generator to generate byte[] members instead of ByteBuffer members, so that the flyweight allocation would be eliminated, but maybe I missed something more clever.

-Todd

Gil Tene

unread,
Aug 8, 2018, 12:41:01 AM8/8/18
to mechanica...@googlegroups.com
*IF* you can use post-java-8 stuff, VarHandles may have a more systemic and intentional/explicit answer for expressing what you are trying to do here, without resorting to Unsafe. Specifically, using a MethodHandles.byteArrayViewVarHandle() that you would get once (statically), you should be able to peek into your many different byte[] instances and extract a field of a different primitive type (int, long, etc.) at some arbitrary index, without having to wrap it up in the super-short-lived ByteBuffer in your example, and hope for Escape analysis to take care of it...

Here is a code example that does the same wrapping you were looking to do, using VarHandles:

import java.lang.invoke.MethodHandles;
import java.lang.invoke.VarHandle;
import java.nio.ByteOrder;


public class VarHandleExample {

   
static final byte[] bytes = {0x02, 0x00, (byte) 0xbe, (byte) 0xba, (byte) 0xfe, (byte) 0xca};

   
private static class FileDesc {
       
static final VarHandle VH_intArrayView = MethodHandles.byteArrayViewVarHandle(int[].class, ByteOrder.LITTLE_ENDIAN);
       
static final VarHandle VH_shortArrayView = MethodHandles.byteArrayViewVarHandle(short[].class, ByteOrder.LITTLE_ENDIAN);
       
private final byte[] buf;
       
int bufPos;

       
FileDesc(byte[] buf, int headerPosition) {
            bufPos
= ((short) VH_shortArrayView.get(buf, headerPosition)) + headerPosition;
           
this.buf = buf;
       
}

       
public int getVal() {
           
return (int) VH_intArrayView.get(buf, bufPos);
       
}
   
}


   
public static void main(String[] args) {
       
FileDesc fd = new FileDesc(bytes, 0);
       
System.out.format("The int we get from fd.get() is: 0x%x\n", fd.getVal());
   
}
}

Running this results in the probably correct output of:

The int we get from fd.get() is: 0xcafebabe


Which means that the byte offset reading in the backing byte[], using little endian, and even at not-4-byte-offset-aligned locations, seems to work.

NOTE: I have NOT examined what it looks like in generated code, beyond verifying that everything seems to get inlined, but as stated, the code would not incur an allocation or need an intermediate object per buffer instance.

Now, since this only works in Java9+, you could code it that way for those versions, and revert to the Unsafe equivalent for Java 8-. You could even convert the code above to code that dynamically uses VarHandle (when available) without requiring javac to know anything about them (using reflection and MethodHandles), and uses Usafe only if VarHandle is not supported. Ugly ProtableVarHandleExample that does that (and would run on Java 7...10) *might* follow...

Gil Tene

unread,
Aug 8, 2018, 2:45:12 AM8/8/18
to mechanical-sympathy
Oh, and there is MethodHandles.byteBufferViewVarHandle if you (for some reason) want to do the same but keep ByteBuffers around.

Todd Lipcon

unread,
Aug 8, 2018, 11:35:54 AM8/8/18
to mechanica...@googlegroups.com
Thanks Gil. Unfortunately I'm stuck on Java 8 for now. And it sounds like I'll have to modify the flat buffers code generation either way to get rid of the byte buffer and replace it at least with some interface that could wrap a bytebuffer, unsafe, varhandle, etc.

Todd

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Todd Lipcon

unread,
Aug 8, 2018, 4:57:47 PM8/8/18
to mechanica...@googlegroups.com
I tried one more approach using an interface with the appropriate 'getBytes', etc, methods, but unfortunately its allocation doesn't seem to be elided either:

MyBenchmark.testWithByteBuffer                                               thrpt    2  164225320.729           ops/s
MyBenchmark.testWithByteBuffer:·gc.alloc.rate.norm                           thrpt    2         56.000            B/op
MyBenchmark.testWithBytes                                                    thrpt    2  289869913.686           ops/s
MyBenchmark.testWithBytes:·gc.alloc.rate.norm                                thrpt    2         ≈ 10⁻⁷            B/op
MyBenchmark.testWithBytesWrappedInAccessor                                   thrpt    2  202213942.822           ops/s
MyBenchmark.testWithBytesWrappedInAccessor:·gc.alloc.rate.norm               thrpt    2         24.000            B/op
MyBenchmark.testWithBytesWrappedInThreadLocalAccessor                      thrpt    2  183097145.922           ops/s
MyBenchmark.testWithBytesWrappedInThreadLocalAccessor:·gc.alloc.rate       thrpt    2         ≈ 10⁻⁴          MB/sec

So while my little byte-array-wrapper is smaller than ByteBuffer (and faster), it still isn't allocation-free. Using a threadlocal can eliminate the allocation but gives up a bit of performance.

So, does anyone have a clever idea to get the same performance as directly passing the byte array, but without any allocation, and in such a way that Java8 is supported? (clearly I could just locally hack the generator to only support byte[] and not ByteBuffer, but would prefer to contribute a change back to the flatbuffers project that can maintain back-compatibility as well). I suppose storing an Object and using 'instanceof' checks is an option, though makes me sad.




Steven Stewart-Gallus

unread,
Jan 17, 2019, 2:41:32 AM1/17/19
to mechanical-sympathy
Is there any reason you can't just manually pack the bytes together?

public int getInt(byte[]buf, int ix){
   
return buf[ix] | buf[ix + 1] << 8 | buf[ix + 2] << 16 | buf[ix + 3] << 24;
}

It should be kind of slow but probably less so than a bunch of allocations
Reply all
Reply to author
Forward
0 new messages