Latest Pushy Version and OutOfMemory "Direct buffer memory"

889 views
Skip to first unread message

Mark Salm

unread,
Oct 12, 2014, 10:08:34 AM10/12/14
to pushy...@googlegroups.com
Hello,

We have just installed the latest stable version of Pushy on a Linux machine with 4GB memory.  This is a dedicated server.

After sending a number of pushes, we get an out of memory exception, because of Netty (we are using 4.0.23):


[nioEventLoopGroup-2-1] DEBUG com.relayrides.pushy.apns.ApnsConnection - NabdPushManager-connection-0 caught an exception.
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:659)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:113)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:168)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:98)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:241)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:83)
at io.netty.handler.ssl.SslHandler.allocate(SslHandler.java:1265)
at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:1275)
at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:453)
at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:432)
at io.netty.channel.ChannelHandlerInvokerUtil.invokeFlushNow(ChannelHandlerInvokerUtil.java:165)
at io.netty.channel.DefaultChannelHandlerInvoker.invokeFlush(DefaultChannelHandlerInvoker.java:355)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:297)
at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at io.netty.channel.ChannelHandlerInvokerUtil.invokeFlushNow(ChannelHandlerInvokerUtil.java:165)
at io.netty.channel.DefaultChannelHandlerInvoker.invokeFlush(DefaultChannelHandlerInvoker.java:355)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:307)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:313)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:994)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:258)
at com.relayrides.pushy.apns.ApnsConnection$3.run(ApnsConnection.java:532)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:343)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:356)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:834)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:701)



We have tried to set a limit for the number of PushQueues, and we set them at 100.

We also tried to set such parameters to increase memory limits:  -Xms512m -Xmx512m -XX:MaxDirectMemorySize=512m

However, we are still getting the above exception, under heavy load.

Any ideas on how to resolve this?

Thanks.

Mark



Jon Chambers

unread,
Oct 13, 2014, 11:37:32 PM10/13/14
to Mark Salm, pushy...@googlegroups.com
Oops; replied to the author, but not to the list. Original reply below for posterity.
----

Hi there!

That's (obviously) not good. Could you tell us a little more about your setup? How many PushMangers do you have in play, and how many concurrent connections do they have? How many notifications (thousands? million?) do you send before this becomes a problem? Is this new behavior in v0.4, or did you also see this in v0.3?

Cheers!

-Jon

--
Pushy is an open-source Java library for sending APNs (iOS and OS X) push notifications. Pushy is brought to you by the engineers at RelayRides.
---
You received this message because you are subscribed to the Google Groups "pushy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pushy-apns+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ngoc Dao

unread,
Oct 14, 2014, 12:10:19 AM10/14/14
to pushy...@googlegroups.com
Pushy uses Netty 4 which uses native memory for speed.
But it may cause out of memory error if there's Pushy doesn't release memory properly.

Please try enabling ADVANCED memory leak detection level:

In your code:
import io.netty.util.ResourceLeakDetector
ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.ADVANCED)

Or add runtime option:
java -Dio.netty.leakDetectionLevel=advanced ...

By default, ResourceLeakDetector is enabled but at SIMPLE level:
http://netty.io/wiki/reference-counted-objects.html#wiki-h3-11
http://netty.io/4.0/api/io/netty/util/ResourceLeakDetector.Level.html

Ngoc Dao

unread,
Oct 14, 2014, 12:10:55 AM10/14/14
to pushy...@googlegroups.com, bac...@yabila.com
Pushy uses Netty 4 which uses native memory for speed.
But it may cause out of memory error if there's Pushy doesn't release memory properly.

Please try enabling ADVANCED memory leak detection level:

In your code:
import io.netty.util.ResourceLeakDetector
ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.ADVANCED)

Or add runtime option:
java -Dio.netty.leakDetectionLevel=advanced ...

By default, ResourceLeakDetector is enabled but at SIMPLE level:
http://netty.io/wiki/reference-counted-objects.html#wiki-h3-11
http://netty.io/4.0/api/io/netty/util/ResourceLeakDetector.Level.html


Mark Salm

unread,
Oct 14, 2014, 3:23:23 AM10/14/14
to pushy...@googlegroups.com, bac...@yabila.com
Hi Jon,

Thanks for the reply. Here are the answers:

- Could you tell us a little more about your setup?   

Amazon EC2 Linux Box, 4GB memory, Java 1.6.0_32, Pushy 0.4, Netty 4.0.23

I've also attached the code we are using.

How many PushMangers do you have in play?  

A single PushManager, using the queue.

How many concurrent connections do they have? 

We didn't configure any parameter for concurrent connections, so I presume the default is set.  What do you mean by this?

- How many notifications (thousands? million?) do you send before this becomes a problem?

Generally 50,000--100,000 notifications if we run pushy without any memory configuration. The problem is very dependent on the "-XX:MaxDirectMemorySize" argument we use in running Pushy via commandline.  The more direct memory we allocate, the more time it takes to throw this out this memory exception.  We were able to send 1 Million pushes using a single PushManager when using a DirectMemorySize of 1GB --- however when sending multiple pushes at the same time, the problem occurs again.

- Is this new behaviour in v0.4, or did you also see this in v0.3?

We only tested this in v0.4.  We also tried using the latest Beta jar of Netty (4.1.0), but got same results.

One fact for sure, Netty is way faster than alternative packages such as JavaPNS and java-apns ---- I really hope we can solve this memory issue so we can use it in production.

Cheers,

Mark
code.rtf

Mark Salm

unread,
Oct 14, 2014, 4:27:51 AM10/14/14
to pushy...@googlegroups.com, bac...@yabila.com
Hi Ngoc,

Thanks for the feedback.  We have tried the ADVANCED memory leak detection, but got the same error (please see below at the end).

Is there anyway where Pushy can just use the available memory?  When we increase the direct memory buffer, we gain few more seconds until this error shows up again.

Is there anything else we can do?  This is a 4GB memory dedicated server.

Thanks


 DEBUG com.relayrides.pushy.apns.ApnsConnection - NabdPushManager-connection-0 caught an exception.

java.lang.OutOfMemoryError: Direct buffer memory

at java.nio.Bits.reserveMemory(Bits.java:659)

at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:113)

at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)

at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434)

at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179)

at io.netty.buffer.PoolArena.allocate(PoolArena.java:168)

at io.netty.buffer.PoolArena.allocate(PoolArena.java:98)

at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:250)

at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)

at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)

at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:83)

at io.netty.handler.ssl.SslHandler.allocate(SslHandler.java:1265)

at io.netty.handler.ssl.SslHandler.allocateOutNetBuf(SslHandler.java:1275)

at io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:453)

at io.netty.handler.ssl.SslHandler.flush(SslHandler.java:432)

at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)

at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:669)

at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)

at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:688)

at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:718)

at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:706)

at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:741)

at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:895)

at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:240)

at com.relayrides.pushy.apns.ApnsConnection$3.run(ApnsConnection.java:532)

at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)

at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)

Ngoc Dao

unread,
Oct 14, 2014, 4:41:20 AM10/14/14
to pushy...@googlegroups.com, bac...@yabila.com
May be it's just because you're sending too fast.

You can try:
* Send 1000 msgs.
* Wait for some time (about 10 seconds?) and check the Pushy's queue length if it's below a threshold.
* Send another 1000 msgs.

It seems that currently Pushy doesn't have any callback to let you know how many msgs have been sent:
https://github.com/relayrides/pushy/issues/130

Mark Salm

unread,
Oct 14, 2014, 4:47:38 AM10/14/14
to pushy...@googlegroups.com, bac...@yabila.com
Hi Ngoc,

As can be seen in the code (attached above), we wait for 1,000 queue requests to be emptied, then send as soon as the queue gets emptied.

We are sending breaking news alerts, so we need to send it in a timely manner.  If you want 10 seconds for each 1,000 and you have a Million records, that's  166 minutes!!  Not very practical.

Thanks

Jon Chambers

unread,
Oct 14, 2014, 10:22:20 AM10/14/14
to Mark Salm, pushy...@googlegroups.com
Well, I have two theories, both of which I'll test a little later today.

The first is that there's just a good, old-fashioned leak somewhere and we're shedding a little bit of memory every time we send a notification. This seems unlikely to me because (a) nobody's ever run into it before and (b) we have, historically, run (local) benchmarks with millions of notifications and a smaller heap and haven't seen this problem. Still, I'll run it through the profiler and see if anything jumps out. This problem would manifest if you were sending lots of notifications over a long period of time and memory ran out at a fairly predictable point every time.

The second—and I think most likely—scenario is that we're writing too fast and the listeners attached to each write operation are piling up. Throttling at the public queue* wouldn't help with this, since notifications leave the queue quickly and (loosely speaking) generate write listeners that may stick around for a while. Really, this disconnect is the core need we're trying to address in https://github.com/relayrides/pushy/pull/114. This problem would manifest if you're sending large bursts of notifications at once, and the failure point would be less predictable because it would depend on the speed at which writes were being "cleared." This seems like it closely matches the symptoms you're describing.

In either case, I'm confident we can find a solution once we've more accurately identified the problem. Mark, if we can't reproduce the issue locally, would you be willing to try a one-off debug build so we can gather additional data?

Backtracking to your earlier response:

We didn't configure any parameter for concurrent connections, so I presume the default is set.  What do you mean by this?

PushManagers can be configured to open an arbitrary number of parallel configurations to the APNs gateway. See the `PushManagerConfiguration` docs (http://relayrides.github.io/pushy/apidocs/0.4/com/relayrides/pushy/apns/PushManagerConfiguration.html) for details. Just knowing you're using the default settings is enough, though.

Thanks for your patience!

-Jon

* For what it's worth, if you DO want to continue to do something like this, you might consider specifying your own `BlockingQueue` implementation with a finite capacity (you can pass in your BlockingQueue as a constructor argument to your PushManager). `ArrayBlockingQueue` should do exactly what you want with much less hassle.

--

Jon Chambers

unread,
Oct 14, 2014, 12:40:47 PM10/14/14
to Mark Salm, pushy...@googlegroups.com
…also, you wrote: "We have tried to set a limit for the number of PushQueues, and we set them at 100."

I don't mean to split hairs, but Pushy doesn't have anything called PushQueue. Could you clarify? What, exactly, are you setting to "100?"

Thanks!

-Jon

Jon Chambers

unread,
Oct 14, 2014, 2:52:39 PM10/14/14
to Mark Salm, pushy...@googlegroups.com
I ran a test to send a million push notifications with a PushManager configured exactly the same way as yours to a mock server. Some preliminary findings:

- I'm not able to reproduce your issue using a mock server, which makes me feel more confident that this has something to do with write speed (since writes would be effectively instantaneous with a mock server on the same host).
- Monitoring with Visual VM suggests that we aren't leaking things on the heap.
- Running with Netty's ResourceLeakDetector at `PARANOID` detects no leaks.

I found some interesting information here, though the issue has long since been resolved: https://github.com/netty/netty/issues/1393

Nothing conclusive yet; I think understanding what you mean by "We have tried to set a limit for the number of PushQueues, and we set them at 100" will help clarify things quite a bit.

-Jon

Jon Chambers

unread,
Oct 16, 2014, 12:01:31 PM10/16/14
to pushy...@googlegroups.com, bac...@yabila.com
Mark,

I don't mean to be a nag, but I think I've done as much debugging as I can without more information from you. Specifically, can you please clarify what you mean by "We have tried to set a limit for the number of PushQueues, and we set them at 100?" I'm afraid I don't know what you mean by "PushQueue," and I didn't see anything being set to 100 in the example code you attached. I think understanding that will help us understand the problem more clearly.

Thanks!

-Jon
To unsubscribe from this group and stop receiving emails from it, send an email to pushy-apns+unsubscribe@googlegroups.com.

Andrew Aziz

unread,
Oct 18, 2014, 9:09:17 AM10/18/14
to pushy...@googlegroups.com, bac...@yabila.com
Hi Jon,

Mark is on leave and I'll handle this on his behalf.

First of all, thanks for you support.

We are adding notifications to the queue until it reach 1000, wait until any element of the queue is empty, and keep adding new notifications and so on.

for (int i = 0; i < devices.size(); i++) {
    token = TokenUtil.tokenStringToByteArray(devices.get(i));
    pushManager.getQueue().put(new SimpleApnsPushNotification(token, payload));
    
    while (true) {
        // We keep adding notifications to the queue until reaching the max (1000)
        if (pushManager.getQueue().size() < 1000) {
            break;
        } else {

        }
    } 
}

That's what we meant about the PushQueue.

Based on your previous replies, I'll increase the number of concurrent connections, disable writing the logs, and retry sending the push without putting a maximum number of the queue size like I did before.

Thanks.

Regards,
Andrew
To unsubscribe from this group and stop receiving emails from it, send an email to pushy-apns+...@googlegroups.com.

Jon Chambers

unread,
Oct 24, 2014, 11:55:12 AM10/24/14
to pushy...@googlegroups.com, bac...@yabila.com
Thanks for the clarification!

> Based on your previous replies, I'll increase the number of concurrent connections, disable writing the logs, and retry sending the push without putting a maximum number of the queue size like I did before.

Before you go there, I have one last idea I want to explore. Are you sure ANY of these notifications are actually getting delivered? As a separate experiment, if you set the logging level to TRACE, start your push manager, sleep for ~10 seconds without sending any notifications, and shut down, what shows up in the logs? It doesn't make any sense that you'd be burning that much direct buffer memory under any circumstances, and much less so with a single connection. I wonder if something is going wrong and causing you to open zillions of connections unexpectedly. Trace log output would be a big help in debugging this issue.

Thanks!

-Jon

Ngoc Dao

unread,
Oct 27, 2014, 2:24:54 PM10/27/14
to pushy...@googlegroups.com, bac...@yabila.com
I'm not sure but this problem may be related to this:

Jon Chambers

unread,
Oct 27, 2014, 2:26:27 PM10/27/14
to Ngoc Dao, pushy...@googlegroups.com, Mark Salm
Seems likely.

-Jon

Jon Chambers

unread,
Oct 27, 2014, 7:27:45 PM10/27/14
to pushy...@googlegroups.com, ngocda...@gmail.com, bac...@yabila.com
It appears that a fix is in the works for Netty 4.0.24. I've created an issue on our end to update as soon as it's out; we'll plan on putting out a point release to address this issue. Thanks for your patience!

-Jon
Seems likely.

-Jon

To unsubscribe from this group and stop receiving emails from it, send an email to pushy-apns+unsubscribe@googlegroups.com.

Andrew Aziz

unread,
Oct 29, 2014, 4:24:18 AM10/29/14
to pushy...@googlegroups.com, bac...@yabila.com
Hi Jon, 

Thanks for the info. My feedback is below inline.

Before you go there, I have one last idea I want to explore. Are you sure ANY of these notifications are actually getting delivered?

Yes, I'm sure of that and we receive some of them on our devices.
 
As a separate experiment, if you set the logging level to TRACE, start your push manager, sleep for ~10 seconds without sending any notifications, and shut down, what shows up in the logs?

I got the normal connectivity trace log:

[main] DEBUG io.netty.util.internal.logging.InternalLoggerFactory - Using SLF4J as the default logging framework
[main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetectionLevel: simple
[main] DEBUG io.netty.channel.MultithreadEventLoopGroup - -Dio.netty.eventLoopThreads: 16
[main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Buffer.address: available
[main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.theUnsafe: available
[main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.copyMemory: available
[main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Bits.unaligned: true
[main] DEBUG io.netty.util.internal.PlatformDependent - UID: 501
[main] DEBUG io.netty.util.internal.PlatformDependent - Java version: 6
[main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.noUnsafe: false
[main] DEBUG io.netty.util.internal.PlatformDependent - sun.misc.Unsafe: available
[main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.noJavassist: false
[main] DEBUG io.netty.util.internal.PlatformDependent - Javassist: unavailable
[main] DEBUG io.netty.util.internal.PlatformDependent - You don't have Javassist in your class path or you don't have enough permission to load dynamically generated classes.  Please check the configuration for better performance.
[main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.tmpdir: /var/folders/zh/bzf7sjyj1tv6xt5s5ws27vm40000gn/T (java.io.tmpdir)
[main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.bitMode: 64 (sun.arch.data.model)
[main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.noPreferDirect: false
[main] DEBUG io.netty.channel.nio.NioEventLoop - -Dio.netty.noKeySetOptimization: false
[main] DEBUG io.netty.channel.nio.NioEventLoop - -Dio.netty.selectorAutoRebuildThreshold: 512
[main] INFO com.relayrides.pushy.apns.PushManager - NabdPushManager starting.
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numHeapArenas: 1
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numDirectArenas: 1
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.pageSize: 8192
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxOrder: 11
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.chunkSize: 16777216
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.tinyCacheSize: 512
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.smallCacheSize: 256
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.normalCacheSize: 64
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedBufferCapacity: 32768
[main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimInterval: 8192
[main] DEBUG com.relayrides.pushy.apns.ApnsConnection - NabdPushManager-connection-0 beginning connection process.
[main] DEBUG io.netty.util.internal.ThreadLocalRandom - -Dio.netty.initialSeedUniquifier: 0x0f047d8689b7c4c4 (took 0 ms)
[main] DEBUG io.netty.buffer.ByteBufUtil - -Dio.netty.allocator.type: unpooled
[main] DEBUG io.netty.buffer.ByteBufUtil - -Dio.netty.threadLocalDirectBufferSize: 65536
{"article":"12577137","aps":{"alert":"Alert","sound":"default","badge":13}}

Sleeping for 10 seconds

[nioEventLoopGroup-2-1] DEBUG com.relayrides.pushy.apns.ApnsConnection - NabdPushManager-connection-0 connected; waiting for TLS handshake.
[nioEventLoopGroup-2-1] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxCapacity.default: 262144
[nioEventLoopGroup-2-1] DEBUG com.relayrides.pushy.apns.ApnsConnection - NabdPushManager-connection-0 successfully completed TLS handshake.
[nioEventLoopGroup-2-1] DEBUG io.netty.handler.ssl.SslHandler - [id: 0x42eda846, /192.168.1.6:63300 => gateway.push.apple.com/17.110.227.26:2195] HANDSHAKEN: TLS_RSA_WITH_AES_128_CBC_SHA

--> After waiting 10 seconds: Shutting down!

[main] INFO com.relayrides.pushy.apns.PushManager - NabdPushManager shutting down.
[nioEventLoopGroup-2-1] DEBUG com.relayrides.pushy.apns.ApnsConnection - NabdPushManager-connection-0 sending known-bad notification to shut down.
[nioEventLoopGroup-2-1] DEBUG com.relayrides.pushy.apns.ApnsConnection - APNs gateway rejected notification with sequence number 1 from NabdPushManager-connection-0 (MISSING_TOKEN).
[nioEventLoopGroup-2-1] DEBUG io.netty.buffer.PoolThreadCache - Freed 16 thread-local buffer(s) from thread: nioEventLoopGroup-2-1

Jon Chambers

unread,
Oct 29, 2014, 12:41:27 PM10/29/14
to Andrew Aziz, pushy...@googlegroups.com, Mark Salm
Pushy v0.4.1, which uses the just-released Netty 4.0.24, has just been released and is making its way to Maven Central. Once you update to v0.4.1, this problem should go away.

Thanks for your patience!

-Jon

郭瑞

unread,
Nov 10, 2014, 10:03:18 PM11/10/14
to pushy...@googlegroups.com, a.a...@yabila.com, bac...@yabila.com
Hi,

We got the same problem in Pushy v0.4.1. It seems like that the more we were rejected by APNS for INVALID_TOKEN, the more quickly we ran out of our direct memory. And when our service throw direct buffer out of memory exception, the heap memory usage is very low. And the stack trace is just like Mark mentioned in his first post. 

Our service is run on a Linux machine with 8G memory, 4 CPU, with -Xmx=3g -Xms=3g. We sent nearly 50 millions messages to APNS a day, and 500 thousands in them will be rejected by APNS for INVALID_TOKEN. 

We still don't know why.....


在 2014年10月30日星期四UTC+8上午12时41分27秒,Jon Chambers写道:

Ngoc Dao

unread,
Nov 10, 2014, 10:19:04 PM11/10/14
to pushy...@googlegroups.com, a.a...@yabila.com, bac...@yabila.com
> Our service is run on a Linux machine with 8G memory, 4 CPU, with -Xmx=3g -Xms=3g. We sent nearly 50 millions messages to APNS a day

How much time does it take to send 50 millions messages?
Do you have to manually write a rate limiter to avoid sending too fast?
...

郭瑞

unread,
Nov 10, 2014, 10:30:27 PM11/10/14
to pushy...@googlegroups.com, a.a...@yabila.com, bac...@yabila.com
40 millions of them sent in 2 hours in the morning(Ah, more specific, half of them will be filtered. So about 20 millions message will really be sent to APNS during 2 hours. And we have 10 machine with same OS, and 4 cpu, 8G memory and so on ) , last 10 millions will be sent the rest of the day.

Our jdk version is 1.6.




在 2014年11月11日星期二UTC+8上午11时19分04秒,Ngoc Dao写道:
</blockqu
...

Ngoc Dao

unread,
Nov 10, 2014, 11:15:04 PM11/10/14
to pushy...@googlegroups.com, a.a...@yabila.com, bac...@yabila.com
Rate of INVALID_TOKEN = 500 thousands / (20 + 10 millions) = 1.67% (high).

As I understand, when there's INVALID_TOKEN error, the connection will be closed and Pushy will have to reconnect.
1.67% means Pushy has to reconnect after every only 60 messages.

Maybe out of memory happens more often when Pushy has to do reconnection (SSL handshake) more often.
...

Ngoc Dao

unread,
Nov 10, 2014, 11:22:27 PM11/10/14
to pushy...@googlegroups.com, a.a...@yabila.com, bac...@yabila.com
郭瑞, do you know why your app have too many INVALID_TOKEN?
Maybe you're using wrong APNs certificates (client side or server side)?

alven.guo

unread,
Nov 11, 2014, 12:46:29 AM11/11/14
to pushy...@googlegroups.com, a.a...@yabila.com, bac...@yabila.com
Thanks a lot for your prompt reply.

There's a lot of things to explain why we have so many INVALID_TOKEN. But I definitively sure that it's not because of wrong certificates.

I'll try to set a rate limit to see if the problem still exists. But I wonder why send message too fast can cause Direct Memory ran out? And I found that in our service the memory usage barely drop down. It's like whenever there's a lot of INVALID_TOKEN, the memory usage grow up a little bit. High INVALID_TOKEN appear rate will cause a sharp curve in memory usage. So it's grow, grow, grow again .... And eventually, Direct memory exception were thrown. As I said, we have 8G memory but the -Xmx is 3g to make the service live longer. 

Another clue is that, when we upgrade Pushy to 0.4.1, the problem is relieved a little because our service can stay alive for several days compare to only one day when we use Pushy 0.4.

Anyway, I will insist on digging the problem and really thank you for your help.








在 2014年11月11日星期二UTC+8下午12时22分27秒,Ngoc Dao写道:

Ngoc Dao

unread,
Nov 11, 2014, 1:27:54 AM11/11/14
to pushy...@googlegroups.com, a.a...@yabila.com, bac...@yabila.com
> why send message too fast can cause Direct Memory ran out?

It's discussed here:
https://github.com/relayrides/pushy/issues/139
https://groups.google.com/forum/#!topic/pushy-apns/zLAWT-T7_3k

You can limit push rate using TrafficCounter like this:

import io.netty.handler.traffic.TrafficCounter;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;

MAX_WAIT_TIME = 1000;
CHECK_INTERVAL = MAX_WAIT_TIME * 3;
MSG_PER_SEC = 1000;

ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);
TrafficCounter trafficCounter = new TrafficCounter(null, executor, "name", CHECK_INTERVAL);
trafficCounter.start();

After each push, sleep some time if necessary:

long ms = trafficCounter.writeTimeToWait(1, MSG_PER_SEC, MAX_WAIT_TIME);
if (ms > 0) Thread.sleep(ms);

alven.guo

unread,
Nov 11, 2014, 5:23:32 AM11/11/14
to pushy...@googlegroups.com, a.a...@yabila.com, bac...@yabila.com
OK. Thanks a lot.  If I got some progress, I'll post it on here.



在 2014年11月11日星期二UTC+8下午2时27分54秒,Ngoc Dao写道:

Jon Chambers

unread,
Nov 11, 2014, 12:05:17 PM11/11/14
to alven.guo, pushy...@googlegroups.com, Andrew Aziz, Mark Salm
Thanks, all. This is really helpful information, and we'll do everything we can to get to the bottom of it.

-Jon

--
Pushy is an open-source Java library for sending APNs (iOS and OS X) push notifications. Pushy is brought to you by the engineers at RelayRides.
---
You received this message because you are subscribed to the Google Groups "pushy" group.

Ngoc Dao

unread,
Nov 12, 2014, 11:26:18 PM11/12/14
to pushy...@googlegroups.com, alve...@gmail.com, a.a...@yabila.com, bac...@yabila.com
I got a heap dump when my program was out of memory.

I opened the heap dump file with Eclipse Memory Analyzer:

I used one instance of Pushy with 15 connections.
The queue was an ArrayBlockingQueue of size 5000.

From image 1 and 2, you can see that Pushy and Netty used half of the memory (2 GB).
The JVM memory was set to 4 GB max.

From image 3 and 4 you can see that Pushy's messages were stored in some kind of linked list.

From image 5 you can see that there were lots of instances of Pushy's connections and notifications (may be they hadn't been garbage collected?).

I'll share the heap dump file if you want.
It's about 5 GB (you'll need a machine with about 10 GB of memory to open it with Eclipse Memory Analyzer).

Image1:



Image2:



Image3:



Image4:



Image5:

Jon Chambers

unread,
Nov 13, 2014, 3:38:15 PM11/13/14
to Ngoc Dao, pushy...@googlegroups.com, 郭瑞, Andrew Aziz, Mark Salm
Thanks; this is also very helpful. I'm still trying to nail down a concrete reproduction case, but I have a hunch that it has to do with cycling connections. Understanding what's holding on to the ApnsConnection references will be a big help, too.

-Jon

alven.guo

unread,
Nov 19, 2014, 2:09:00 AM11/19/14
to pushy...@googlegroups.com, ngocda...@gmail.com, alve...@gmail.com, a.a...@yabila.com, bac...@yabila.com
I got something new here. Hope I'm not bothering you too much.  : )
When pushy consume a lot of direct memory, I found a lot of ChannelOutboundBuffer$Entry in the heap. as follows:







在 2014年11月14日星期五UTC+8上午4时38分15秒,Jon Chambers写道:

alven.guo

unread,
Nov 19, 2014, 2:43:08 AM11/19/14
to pushy...@googlegroups.com, ngocda...@gmail.com, alve...@gmail.com, a.a...@yabila.com, bac...@yabila.com

Choosing some of the ChannelOutboundBuffer$Entry in the heap, I got the same GC root path :

It's like somehow, netty doesn't recycle ChannelOutboundBuffer$Entry properly. But I haven't know why.



在 2014年11月19日星期三UTC+8下午3时09分00秒,alven.guo写道:

Jon Chambers

unread,
Nov 20, 2014, 11:52:38 AM11/20/14
to pushy...@googlegroups.com, ngocda...@gmail.com, alve...@gmail.com, a.a...@yabila.com, bac...@yabila.com
Again, thanks for the information. All of this is really helpful. I realized we hadn't actually created a bug report for this, so I went ahead and opened an issue here: https://github.com/relayrides/pushy/issues/142. I'm still working to build a minimal test case, though, and I regret that I've had other things on my plate and haven't been able to give it my full focus so far.

-Jon
Reply all
Reply to author
Forward
0 new messages