Netty; java.lang.OutOfMemoryError: Direct buffer memory

Stan

unread,

Jan 16, 2012, 7:22:47 PM1/16/12

to Akka User List

We're seeing the error listed below in a production setup. The number
of nodes in the cluster is much larger than our normal dev
environment, and I wonder if this is related to the problem? There
was a thread from Nov 2k11 talking about a bug in Netty and proposed
workarounds. At that time, it looks like it was sensitive to the jdk,
kernel, etc. and was shown on Mac but not Ubuntu.

This Netty bug seems to be the problem: https://issues.jboss.org/browse/NETTY-424

Has anybody seen or better yet fixed this?

We're running Akka 1.2 at the moment with Netty 3.2.5 Final.

--------------------------------------------------------------

java.lang.OutOfMemoryError: Direct buffer memory

[ERROR] [1/16/12 11:57 PM] [akka:event-driven:dispatcher:global-15]
[Switch] D
irect buffer memory
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:633)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
at org.jboss.netty.channel.socket.nio.SocketSendBufferPool
$Preallocation
.<init>(SocketSendBufferPool.java:155)
at
org.jboss.netty.channel.socket.nio.SocketSendBufferPool.<init>(Socket
SendBufferPool.java:42)
at
org.jboss.netty.channel.socket.nio.NioWorker.<init>(NioWorker.java:84
)
at
org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.<init>
(NioClientSocketPipelineSink.java:74)
at
org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<ini
t>(NioClientSocketChannelFactory.java:135)
at
org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.<ini
t>(NioClientSocketChannelFactory.java:105)
at akka.remote.netty.ActiveRemoteClient$$anonfun$connect
$1.apply$mcV$sp(
NettyRemoteSupport.scala:431)
at akka.util.Switch.liftedTree1$1(LockUtil.scala:126)
at akka.util.Switch.transcend(LockUtil.scala:125)
at akka.util.Switch.switchOn(LockUtil.scala:138)
at
akka.remote.netty.ActiveRemoteClient.connect(NettyRemoteSupport.scala:
427)
at akka.remote.netty.NettyRemoteClientModule
$class.withClientFor(NettyRemoteSupport.scala:112)
at
akka.remote.netty.NettyRemoteSupport.withClientFor(NettyRemoteSupport.scala:
649)
at akka.remote.netty.NettyRemoteClientModule
$class.send(NettyRemoteSupport.scala:94)
at
akka.remote.netty.NettyRemoteSupport.send(NettyRemoteSupport.scala:
649)
at
akka.actor.LocalActorRef.postMessageToMailbox(ActorRef.scala:862)
at akka.actor.ScalaActorRef$class.$bang(ActorRef.scala:1398)
at akka.actor.LocalActorRef.$bang(ActorRef.scala:605)

√iktor Ҡlang

unread,

Jan 16, 2012, 7:37:23 PM1/16/12

to akka...@googlegroups.com

Hi Stan,

I'm currently working on sorting out a remoting issue on large clusters for 1.3-RC7,

it'd be very helpful if you could try to switch to that when it's released.

Aside from that, I would recommend you to set a long read-timeout in your config.

Cheers,

√

--
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To post to this group, send email to akka...@googlegroups.com.
To unsubscribe from this group, send email to akka-user+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/akka-user?hl=en.

--
Viktor Klang

Akka Tech Lead

Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Stan

unread,

Jan 16, 2012, 8:07:14 PM1/16/12

to Akka User List

Thanks Viktor.

We increased the available memory to those processes (they were set at
128m for some reason) and if the error still shows up, we'll try
setting the timeout higher.

The test code in the netty ticket mentions this to repro the issue:

for (int i = 0; i < Integer.MAX_VALUE; i ++) {
ChannelFactory channelFactory = new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool());
Bootstrap bootstrap = new ClientBootstrap(channelFactory);
bootstrap.setPipelineFactory(new ChannelPipelineFactory()
{ ... });
ChannelFuture future = bootstrap.connect(serverAddress);
future.await(); // or future.addListener(new
ChannelFutureListener() { ... });
Channel channel = future.getChannel();
channel.close();
// FIXME NioClientSocketChannelFactory direct buffer memory leak
channelFactory.releaseExternalResources(); // or
bootstrap.releaseExternalResources();
}

So, it appears that bootstrapping the clients can result in a direct
buffer memory leak if it happens "too often".

Anyway, I'll post results here and bug the Netty folks too.

> Typesafe <http://www.typesafe.com/> - The software stack for applications
> that scale
>
> Twitter: @viktorklang

√iktor Ҡlang

unread,

Jan 18, 2012, 5:02:08 AM1/18/12

to akka...@googlegroups.com

Hi Stan,

did you increase the read timeout?
Have you eliminated errors in remoting so that the remove client doesn't need to be tossed away too frequently?

Cheers,
√

Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Stan

unread,

Jan 18, 2012, 2:00:25 PM1/18/12

to Akka User List

So far, so good. We increased the per process memory as well as the
read timeout. I haven't heard from Ops, they're onto another problem
I think, but no news is likely good news.

I'm still investigating the issue, however, and trying to be confident
that the increased values will solve the problem.

I created a server side script which just starts a remote server on a
given port, registers a stubby-printy-thingy-actor as per session, and
sits there. I started 5, 10, 20, 42, and 100 of these servers on an
ubuntu linux system and have connected to them from the REPL on my
laptop.

From the console, i'm able to obtain remote actor references for each
of the server processes. Since these servers are on separate ports, I
am assuming that this will mean a separate remote client for each
connection. After that, I send pings to all the actors in order to
exercise the connections. I see the connections open and subsequently
close normally, messages are received, and no errors have been
reported.

I'm going to repeat the experiment with the client REPL running on the
server machine as well. The original post mentioned a difference
between Mac and Ubuntu (if i remember right) and noted a problem on
the former but none on the latter.

On Jan 18, 2:02 am, √iktor Ҡlang <viktor.kl...@gmail.com> wrote:
> Hi Stan,
>

√iktor Ҡlang

unread,

Jan 18, 2012, 2:35:30 PM1/18/12

to akka...@googlegroups.com

Great news, Stan!

Reply all

Reply to author

Forward