unsafe.allocateMemory vs ByteBuffer.allocateDirect

3,016 views
Skip to first unread message

ymo

unread,
Feb 5, 2014, 9:15:38 AM2/5/14
to mechanica...@googlegroups.com
Hi All.

I am playing with https://github.com/real-logic/simple-binary-encoding and i notice that it uses  ByteBuffer.allocateDirect instead of unsafe.allocateMemory(). Unfortunately allocateDirect only takes an int. As such if someone wanted to allocate a huge amount of memory they are just out of luck.

1) is it by design ?
2) can you convert memory allocated by unsafe.allocateMemory to a ByteBuffer so that it can be used by uk.co.real_logic.sbe.codec.java.DirectBuffer ?

p.s.
kudos for Martin et. al for this library !

Peter Lawrey

unread,
Feb 5, 2014, 10:35:27 AM2/5/14
to mechanica...@googlegroups.com
1)
ByteBufrers were designed and implemented in 2002 when machines were 32-bit or had less than 2 GB. 

Some problems with using ByteBuffers;
- the size is limited to Integer.MAX_VALUE i.e. 2 GB - 1 !!
- the data is zero'd out which has an overhead in clearing and you end up touching every page. i.e. You cannot allocate virtual memory have it turn into memory lazily,
- every access e.g. every byte access, has a bounds check which is not optimised away by the JVM.  This makes it quite a bit slower for byte accesses.  For long/double accesses it is only about 5% slower.

2)
This is what ByteBuffer does already so "converting" wouldn't help.  There is no way to sub-class ByteBuffer to address 2+ GB as all the values are int (except for the address)



--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ymo

unread,
Feb 5, 2014, 10:51:06 AM2/5/14
to mechanica...@googlegroups.com
Thank you peter for your detailed answer !

Now i am wondering if using unsafe.allocateMemory was the right (TM)  thing to do ? I imagine it would require quite a change tho :-(
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Peter Lawrey

unread,
Feb 5, 2014, 11:01:49 AM2/5/14
to mechanica...@googlegroups.com
What I do is have a Bytes interface which has implementations which wraps heap ByteBuffer, direct ByteBuffer, Unsafe.allocateMemory and memory mapped files. It can be 63-bit sized. Note: it unwraps ByteBuffer thus bypassing some of it's protections.  It also supports ObjectOutput, ObjectInput, Appendable (for writing text), ByteStringParser (for parsing text), compressed types, object pooling for deserialization and thread safe constructs such as volatile, ordered, atomic operations, locking.  Using an Externalizable to/from Bytes is significantly faster than using ObjectInput/OuputStream.

This sort of abstraction is needed to hide the limitations or implementation details of where you get your memory from. It can even have less overhead than the thing you wrap. ;)


To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

ymo

unread,
Feb 5, 2014, 11:07:30 AM2/5/14
to mechanica...@googlegroups.com
Indeed ... however , in this case the code that uses the ByteBuffer is generated java code. The SbeTool generates the java pojos so under the hood the java classes can use anything as long as they provide the same interface. I would wait to hear from Martin on this one :-)


On Wednesday, February 5, 2014 11:01:49 AM UTC-5, Peter Lawrey wrote:
What I do is have a Bytes interface which has implementations which wraps heap ByteBuffer, direct ByteBuffer, Unsafe.allocateMemory and memory mapped files. It can be 63-bit sized. Note: it unwraps ByteBuffer thus bypassing some of it's protections.  It also supports ObjectOutput, ObjectInput, Appendable (for writing text), ByteStringParser (for parsing text), compressed types, object pooling for deserialization and thread safe constructs such as volatile, ordered, atomic operations, locking.  Using an Externalizable to/from Bytes is significantly faster than using ObjectInput/OuputStream.

This sort of abstraction is needed to hide the limitations or implementation details of where you get your memory from. It can even have less overhead than the thing you wrap. ;)
On 5 February 2014 15:51, ymo <ymol...@gmail.com> wrote:
Thank you peter for your detailed answer !

Now i am wondering if using unsafe.allocateMemory was the right (TM)  thing to do ? I imagine it would require quite a change tho :-(


On Wednesday, February 5, 2014 10:35:27 AM UTC-5, Peter Lawrey wrote:
1)
ByteBufrers were designed and implemented in 2002 when machines were 32-bit or had less than 2 GB. 

Some problems with using ByteBuffers;
- the size is limited to Integer.MAX_VALUE i.e. 2 GB - 1 !!
- the data is zero'd out which has an overhead in clearing and you end up touching every page. i.e. You cannot allocate virtual memory have it turn into memory lazily,
- every access e.g. every byte access, has a bounds check which is not optimised away by the JVM.  This makes it quite a bit slower for byte accesses.  For long/double accesses it is only about 5% slower.

2)
This is what ByteBuffer does already so "converting" wouldn't help.  There is no way to sub-class ByteBuffer to address 2+ GB as all the values are int (except for the address)

On 5 February 2014 14:15, ymo <ymol...@gmail.com> wrote:
Hi All.

I am playing with https://github.com/real-logic/simple-binary-encoding and i notice that it uses  ByteBuffer.allocateDirect instead of unsafe.allocateMemory(). Unfortunately allocateDirect only takes an int. As such if someone wanted to allocate a huge amount of memory they are just out of luck.

1) is it by design ?
2) can you convert memory allocated by unsafe.allocateMemory to a ByteBuffer so that it can be used by uk.co.real_logic.sbe.codec.java.DirectBuffer ?

p.s.
kudos for Martin et. al for this library !

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Martin Thompson

unread,
Feb 5, 2014, 12:26:24 PM2/5/14
to mechanica...@googlegroups.com
Hi,

The SBE implementations do not allocate the ByteBuffers or byte arrays used by the codecs. DirectBuffer simply wraps a ByteBuffer or byte[] via its constructor.

This is by design so that MappedByteBuffers or DirectByteBuffers can be used for the storage or network transmission of encoded data without a copy being required.

Martin...

Peter Lawrey

unread,
Feb 5, 2014, 12:38:03 PM2/5/14
to mechanica...@googlegroups.com

The reality is you can have a List <ByteBuffer> if you need 2 GB or mote.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

ymo

unread,
Feb 5, 2014, 1:31:09 PM2/5/14
to mechanica...@googlegroups.com
Martin i am sorry if i am restating the question but are you saying that if i had a chunk of memory that was allocated in C i cannot pass that  memory around for network or file I/O to java unless i did a memcopy to an actual ByteBuffer ?

Thank you in advance.

Norman Maurer

unread,
Feb 5, 2014, 1:34:59 PM2/5/14
to ymo, mechanica...@googlegroups.com
There is a JNI call for create ByteBuffer from a pointer. I think this is what you want:



-- 
Norman Maurer

ymo

unread,
Feb 5, 2014, 1:40:10 PM2/5/14
to mechanica...@googlegroups.com, ymo, norman...@googlemail.com
Norman as Peter already said :

"2) This is what ByteBuffer does already so "converting" wouldn't help.  There is no way to sub-class ByteBuffer to address 2+ GB as all the values are int (except for the address)"


To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

ymo

unread,
Feb 5, 2014, 1:44:34 PM2/5/14
to mechanica...@googlegroups.com
Peter i only deal with the bytebuffer when i need to allocate the memory and when i need to do network and file I/O. The rest of the time the actual buffer that is being used is hidden behind the generated code. So abstracting the storage to have List<ByteBuffer> is both non optimal and non trivial 

Rajiv Kurian

unread,
Feb 5, 2014, 2:03:01 PM2/5/14
to mechanica...@googlegroups.com
AFAIK if you are using a library that uses ByteBuffers or wrappers around them, then you cannot really go beyond the 2 GB limit. You can change ByteBuffers via reflection to point to a chunk of memory that you allocate, but that doesn't go around the limit.

If you are building your own abstractions then using a list of ByteBuffers with wrapper methods can help. This will allow you to still use NIO to fill in ByteBuffers, then put them in a class that provides access to the underlying content. If you want continuous chunks of memory greater than 2 GB then you can possibly allocate using unsafe but you are on your own on how you do IO with the chunk you received. There are JNI wrappers around epoll available that can help you here if network IO is what you need. It might even be faster than the epoll SelectorImpl in Java. But again you are on your own when it comes to processing these chunks since there are no established conventions on how to represent, access and modify chunks greater than 2 GB. Most libraries don't take a long pair(pointer, size).

Peter Lawrey

unread,
Feb 5, 2014, 3:35:57 PM2/5/14
to mechanica...@googlegroups.com

JNI is the best option but you can create an empty direct ByteBuffer and set the address and capacity via reflection.

Kevin Burton

unread,
Feb 5, 2014, 6:02:22 PM2/5/14
to mechanica...@googlegroups.com
This is a great thread btw... just replying to a few of these inline.

The entire mmap/malloc system in Java is broken in my opinion.  It should be abandoned and another API provided.

What I did is that I created JNA bindings for malloc/mmap directly.

They return 64 bit pointers directly and I just use these directly from your API.

In theory you could return a ByteBuffer if you wanted by reflecting it and changing the inner address but I found it's much better to provide an entirely new API.  

I created ByteSlabs which were just like ByteBuffer but 64bit and uses JNA.

One issue is that JNA is slower but I don't really do small mallocs... it's mostly mmap of 2GB files and I don't do that too often.

Interestingly enough I wasn't able to get performance on par with the JVM.  I'm within 20% or so.. I think this is because some of the putXX methods are intrinsics (at least that's what others have suggested on this list).

Another area of optimization could be not zeroing out the memory allocated and in practice I didn't find that this was a significant optimization.

It could be a problem with my benchmarks or the fact that the JVM optimized this. 
 

Peter Lawrey

unread,
Feb 5, 2014, 6:08:20 PM2/5/14
to mechanica...@googlegroups.com

From my tests, not zeroing out helps on Linux but not windows.

You can use reflections to get the raw map () and unmap () native methods on FileChannelImpl. The low level native methods all support 64 bit.

--

Jin Mingjian

unread,
Feb 5, 2014, 8:52:33 PM2/5/14
to mechanica...@googlegroups.com
ByteBuffer.allocateDirect is based on Unsafe.allocateMemory, so which is more general than. 

For memory zeroing, the spec(javadoc) of Unsafe.allocateMemory explicitly disables this for your. So, you may get garbage from Unsafe.allocateMemory. This is why the DirectByteBuffer zero them for it. Specially, under the Linux, Unsafe.allocateMemory uses the glibc's malloc, which will use mmap to allocate the chunk when the chunk is larger than some size(2MB+). The mmap in Linux guarantee the memory zeroing for security reason. But the mmap is very slow(by comparing to small chunk allocation). 

So, it recommend you use a pool like design for large size, which is done by my landz memory allocation subsystem's global pool. But you also should know the Unsafe.allocateMemory wrapper is significantly slow to the native glibc's malloc(and more like jemalloc). If you use it frequently, you may take care of this.

For some problems shown in ByteBuffer, I avoided them in designing the Landz's memory allocation and buffer subsystem. But now I restrict the allocation size to < 2M for first prototyping(but the API is large memory usage in mind). I make a change for GB chunk support soon, although I think this large chunk usage is only for some restricted users.

best regards,
Jin



Peter Lawrey

unread,
Feb 6, 2014, 2:15:49 AM2/6/14
to mechanica...@googlegroups.com

On the ubuntu system I have, allocations larger than 128 KB are mmap which also has a problem that there appears to be a limit to how many mmaps you program can have which is pretty easy to reach. Needs more research. ...

Jin Mingjian

unread,
Feb 6, 2014, 3:12:23 AM2/6/14
to mechanica...@googlegroups.com
Peter, thanks for sharing your case. You are right, 2MB static threshold seems out of date, more can be seen here, http://man7.org/linux/man-pages/man3/mallopt.3.html:

" Note: Nowadays, glibc uses a dynamic mmap threshold by  default.  The initial value of the threshold is 128*1024, but  when blocks larger than the current threshold and less than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold  is adjusted upward to the size of the freed block."

This seems, the safe value for mmap by glibc malloc is DEFAULT_MMAP_THRESHOLD_MAX(32MB in 64bit linux); the safe value for sbr(small allocation) is M_MMAP_THRESHOLD(128KB);


Your "how many mmaps" may be limited to the resource argument RLIMIT_MEMLOCK(http://man7.org/linux/man-pages/man2/getrlimit.2.html)?

Jin

Raymond Manaloto

unread,
Mar 3, 2014, 10:56:36 AM3/3/14
to mechanica...@googlegroups.com
I am coming across the same issues with the simple-binary-encoding project when trying to store/retrieve messages in a memory mapped file greater than 2GB/Integer.MAX_VALUE.

I created an issue ticket in their project to see if they can help tackle the problem (https://github.com/real-logic/simple-binary-encoding/issues/97),

Peter Lawrey

unread,
Mar 3, 2014, 6:55:38 PM3/3/14
to mechanica...@googlegroups.com
In OpenHFT/Java-Lang I have added support for memory mapped files of 63-bit sizes in a single mapping.  You might find something there which is useful https://github.com/OpenHFT/Java-Lang/blob/master/lang/src/main/java/net/openhft/lang/io/MappedFile.java

This only works on OpenJDK/HotSpot Java 7 and 8.




Martin Thompson

unread,
Mar 4, 2014, 7:51:08 AM3/4/14
to mechanica...@googlegroups.com
As mentioned in the issue ticket. I think a good approach would be to provide a duck typed interface so that the underlying buffer can be long or int indexed. This can also be useful for resizing strategies when the buffer needs extended as flyweights write forward for large messages. If it was pure long indexed with indirection and selection then the vast majority of use takes a performance hit for the minority.

The duck typed class can be provided during the stub generation phase for SBE.

Does this work for your requirements?

Peter Lawrey

unread,
Mar 4, 2014, 2:32:33 PM3/4/14
to mechanica...@googlegroups.com
You can use/store int offsets as well, and you can use some tricks to address more than 4 GB if you need that, but you need to end up with a 64-bit address in the end.  The biggest cost for large data structures is the fact you are going off cache a lot of the time and getting TLB misses.  In the later case I have seen worst case latency better on a Haswell out than an Ivy Bridge processor with a large L3 cache, but smaller TLB cache.

For example, SharedHashMap uses multiple segments which are also allocation arenas. As I assume records are of fix size, you will never need more than 2 bn records per segment.  I have even considered using 16-bit record indexes, allowing 64K entries per segment.



Martin Thompson

unread,
Apr 2, 2014, 4:57:38 AM4/2/14
to mechanica...@googlegroups.com
I've just added support to SBE for wrapping an off-heap address that is allocated with Unsafe or via a JNI call.



On Wednesday, 5 February 2014 14:15:38 UTC, ymo wrote:

ymo

unread,
Apr 2, 2014, 3:18:15 PM4/2/14
to mechanica...@googlegroups.com
Martin you are da maaaaan ))) One quick question however : 

The allocation is still limited to an int as far as capacity is concerned. So what is the advantage that is new ? I would assume that the only reason someone would want to use  unsafe.allocateMemory would be to circumvent the capacity as well as the limit checks on the buffer which cannot be avoided.

Peter Lawrey

unread,
Apr 2, 2014, 4:03:19 PM4/2/14
to mechanica...@googlegroups.com
On 2 April 2014 14:18, ymo <ymol...@gmail.com> wrote:
Martin you are da maaaaan ))) One quick question however : 

The allocation is still limited to an int as far as capacity is concerned. So what is the advantage that is new ? I would assume that the only reason someone would want to use  unsafe.allocateMemory would be to circumvent the capacity as well as the limit checks on the buffer which cannot be avoided.


I would add
- it doesn't zero out the memory which can make it significantly faster esp on Linux.  
- it doesn't create an object
- it can be split up how you like. 
- you can take into account memory alignment by knowing the underlying address.
- you can free it deterministically. (You can use the cleaner to do this with ByteBuffer)
 

Martin Thompson

unread,
Apr 2, 2014, 4:39:28 PM4/2/14
to mechanica...@googlegroups.com
I think Peter answers to the benefits really well. In addition, you can very cheaply move this as a window around a very large memory region that is off heap. Check out the wrap(address, capacity) method to move it around. The DirectBuffer also allows you to duplicate a direct ByteBuffer that you can use for IO without copy. If you duplicate it the cleaner is nulled out so you retain control of the memory lifecycle.

ymo

unread,
Apr 2, 2014, 4:44:45 PM4/2/14
to mechanica...@googlegroups.com
Peter, Martin .. thank you all. One greatly satisfied *customer* )))
Reply all
Reply to author
Forward
0 new messages