Potential performance improvement to netty unsafe buffers.

612 views
Skip to first unread message

Kevin Burton

unread,
Jul 25, 2013, 12:39:08 AM7/25/13
to ne...@googlegroups.com
So I've been spending more time investigating issues with memory allocation and buffer performance with Java.

One thing I discovered is that the code in the JVM is horrible.  There are at least 2-3 bugs I discovered and performance issues.

Another issue as it applies to Netty is that when buffers are allocated it creates a synchronized lock to bump up a reserved memory counter.

GRANTED Netty could somewhat bypass that by using biased locking.

However, if Netty were to just go straight to using unsafe.allocateMemory (including the cache alignment that DirectByteBuffer does) then it would be even faster.

It wouldn't support the MAX_DIRECT_MEMORY setting that the JVM supports BUT that feature has a bug in it with page cache alignment anyway.

Norman Maurer

unread,
Jul 25, 2013, 1:10:09 AM7/25/13
to ne...@googlegroups.com
Hey Kevin,

thanks for your investigation. Another thing I noticed when benchmark vert.x (which uses Netty 4) was that when I was using UnpooledByteBufAllocator the release of direct ByteBuf instances created a hotspot.
This is because we use the Cleaner here which use some methods that are static synchronized (ouch!). So my idea was to just get rid of this and use JNI ;)

It's still a work in progress and I need to run it through a proper benchmark, but maybe you are interested:

Also what page cache alignment you are talking about ? I may worth to build it in the native impl ?


--
 
---
You received this message because you are subscribed to the Google Groups "Netty discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to netty+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Kevin Burton

unread,
Jul 25, 2013, 2:59:49 PM7/25/13
to ne...@googlegroups.com, norman...@googlemail.com
Honestly I would just abandon using ByteBuffer entirely.  Internally the JVM us using Unsafe anyway and you're just asking for trouble because honestly a lot of the code here is just flat out horrible.

I posted a lot of my writeup to the mechanical sympathy group.

There is explicit GC invocation, duplicate code blocks, and in one place a errant Thread.sleep( 100 ) with no explanation why the hell you're sleeping for 100ms.

This documents the page cache alignment issue:


and here's the bug in the JVM regarding the incorrect 'reserved memory' bug.


My plan is to use Unsafe for everything.  including allocateMemory and freeMemory to release it.  I'm going to bypass the entire JVM here as all the code is ugly, bug ridden, and has far too many locks.  

For example, I was unaware of the synchronized methods in Cleaner!  Ouch.  Can you create a gists of this ?  Perhaps some of us should write up a criticism of direct memory handling in the JVM and post it to the OpenJDK list along with examples of what is broken.

Unsafe makes this stuff much "easier" and at least the gotchas there can be fixed.  

I had been using Netty ChannelBuffers within my code for things like sorting data and I realized that I have different requirements and it would be best for me to implement my own buffer system. 

One issue is that I only need about 3-4 methods.  The other is that I need a 64bit pointer so I can work with memory > 2GB.  That and the performance benchmarks showing that Netty was slower with the Unsafe implementation was enough to push me in this direction.

I implemented a version of an UnsafeByteSlab (ByteSlab is the name of the interface I'm using here) which is oddly enough 5% slower than DirectByteBuffers.  One issue was cache alignment, which I fixed, the other issue was that I'm doing too much range assertion and I can back off a bit... 

I'll post the source here when I'm done.. you can see it now if you want but it's still in development.  I imagine you can just take of my code and put it into Netty directly.

you should also implement a Netty setting for the equivalent of "MAX_DIRECT_MEMORY" so that you can give Netty a HARD limit on memory so that  bug doesn't actually cause the OOM killer to kick in.  

Kevin

Norman Maurer

unread,
Jul 26, 2013, 1:40:46 AM7/26/13
to ne...@googlegroups.com
Hey Kevin,

once I start to implement a native transport (using JNI) I will for sure not use ByteBuffer at all. The native stuff I showed you was just to remove the hotspot of synchronization in Cleaner and still use the jdk provided Channel abstraction for networking.

Thanks again for all of your infos and links, I'm really interested to read more about your findings in the future!

Bye,
Norman

Kevin Burton

unread,
Jul 26, 2013, 1:23:47 PM7/26/13
to ne...@googlegroups.com, norman...@googlemail.com
> once I start to implement a native transport (using JNI)

Is there any more information on this?  What's a "native transport" ?

Kevin

Kevin Burton

unread,
Jul 26, 2013, 1:43:27 PM7/26/13
to ne...@googlegroups.com, norman...@googlemail.com
as long as you're going to go native, read the megapipe paper.  There are some good ideas there and you might be able to get a lot more performance than just epoll. 

Some of the points they made I wasn't aware of... specifically that the socket() API is hindered using file handles because posix requires you to return the first free and lowest file handle.

There's also a good video about using similar techniques for 10M connections.  I can't find it at the moment but I think I posted it here.

Rajiv Kurian

unread,
Jul 26, 2013, 2:22:25 PM7/26/13
to ne...@googlegroups.com
My understanding is that Megapipe requires a patch to the Linux kernel. It's still very much an experiment. The c10m connection video that I think you are referring to requires a user space networking stack. Most of these are not open source. pf_ring which is OSS works on Linux but support is still patchy. Not to mention one has to still build a custom tcp/ip stack on top.
IMHO these options are great for custom software to be deployed in a specific environment (patched kernel, specific NIC with a user space driver available, custom TCP/IP implementation) but not usable by anything remotely generic. I don't see how any of these solutions will reach the mass market without widespread OS support plus JVM support for java users. One could use JNI to overcome lack of JVM support, but the kernel is a blocker. Sadly I think today these exotic options are only viable for a small subset of people with extreme control over their deployment environment and the expertise or money to build a bug free end to end solution ie HFT shops :)

Kevin Burton

unread,
Jul 26, 2013, 4:42:38 PM7/26/13
to ne...@googlegroups.com
There were some other suggestions in both places that I think can apply to Netty... but of course I haven't dived down into the low level Netty details to be 100% certain.

And I agree that Megapipe, pf_ring woudl be VERY custom ... but if it were possible to keep Netty abstract enough, it might be possible to port Netty to a megapipe -style framework in the future.

And you're right that a LOT of this is going to require some work in both the kernel and JVM space.

But if someone DID do the work to get this to work in Netty including the kernel work it would really kick ass... :)

And i think that in theory all the JNI work could work on the JVM without a problem.

Rajiv Kurian

unread,
Jul 26, 2013, 9:10:47 PM7/26/13
to ne...@googlegroups.com
What do you think is actionable right now given MegaPipe requires new kernel APIs and that custom networking stacks require specific NICs, specific drivers and building a TCP/IP stack. The whole point of user space networking solutions is that you do not build a generic TCP/IP stack and instead customize heavily to satisfy your needs.


On Friday, July 26, 2013 1:42:38 PM UTC-7, Kevin Burton wrote:
There were some other suggestions in both places that I think can apply to Netty... but of course I haven't dived down into the low level Netty details to be 100% certain.

And I agree that Megapipe, pf_ring woudl be VERY custom ... but if it were possible to keep Netty abstract enough, it might be possible to port Netty to a megapipe -style framework in the future.

And you're right that a LOT of this is going to require some work in both the kernel and JVM space.

But if someone DID do the work to get this to work in Netty including the kernel work it would really kick ass... :)
How would some one get this into Netty without having depending on a custom build of Linux? 

Rajiv Kurian

unread,
Jul 26, 2013, 9:20:56 PM7/26/13
to ne...@googlegroups.com


On Friday, July 26, 2013 6:10:47 PM UTC-7, Rajiv Kurian wrote:
What do you think is actionable right now given MegaPipe requires new kernel APIs and that custom networking stacks require specific NICs, specific drivers and building a TCP/IP stack. The whole point of user space networking solutions is that you do not build a generic TCP/IP stack and instead customize heavily to satisfy your needs.

On Friday, July 26, 2013 1:42:38 PM UTC-7, Kevin Burton wrote:
There were some other suggestions in both places that I think can apply to Netty... but of course I haven't dived down into the low level Netty details to be 100% certain.

And I agree that Megapipe, pf_ring woudl be VERY custom ... but if it were possible to keep Netty abstract enough, it might be possible to port Netty to a megapipe -style framework in the future.

And you're right that a LOT of this is going to require some work in both the kernel and JVM space.

But if someone DID do the work to get this to work in Netty including the kernel work it would really kick ass... :)
How would some one get this into Netty without having depending on a custom build of Linux? 
* How would some one get this into Netty without depending on a custom build of Linux? 

Kevin Burton

unread,
Jul 26, 2013, 9:27:18 PM7/26/13
to ne...@googlegroups.com
I'm not implementing this so I'm not sure... my whole point was just to point out to review this work to SEE if anything was actionable.

Actually, the syscall coalescing is probably actionable but I am not sure if Norman wants to bite off more work.

We need that in general though. If Peregrine had syscall coalescing that would rock!


On Friday, July 26, 2013 6:10:47 PM UTC-7, Rajiv Kurian wrote:

Norman Maurer

unread,
Jul 29, 2013, 1:32:47 AM7/29/13
to ne...@googlegroups.com
I'm still in the "thinking process" so bear with me a bit ;)


---
Norman Maurer

JBoss, by Red Hat



Kevin Burton

unread,
Jul 29, 2013, 12:45:03 PM7/29/13
to ne...@googlegroups.com, nma...@redhat.com
We're just going to give you more work in the mean time ;)
Reply all
Reply to author
Forward
0 new messages