[grpc-java 1.0.0] Use grpc in production causes memory leak

3,440 views
Skip to first unread message

Taehyun Park

unread,
Sep 27, 2016, 8:26:34 PM9/27/16
to grpc.io
Hello I have a memory leak problem that my server application hangs due to out of heap exception. This kind of memory leak is very new to me so I'm not sure how can I resolve this issue.
Could anyone take a look at leak suspescts and tell me if this is caused by grpc? 
Please let me know if you need further information from my heap dump.

Thank you in advance. 

Problem Suspect 1

5,938 instances of "io.netty.buffer.PoolThreadCache", loaded by "sun.misc.Launcher$AppClassLoader @ 0x6c621ecf8" occupy 1,280,098,864 (57.78%) bytes.


Problem Suspect 2

18 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by "sun.misc.Launcher$AppClassLoader @ 0x6c621ecf8" occupy 477,526,856 (21.56%) bytes.



Biggest instances:

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c7e01360 epollEventLoopGroup-3-9 - 36,210,360 (1.63%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c7e55440 epollEventLoopGroup-3-16 - 34,013,456 (1.54%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c7e06c68 epollEventLoopGroup-3-15 - 33,574,944 (1.52%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c8359008 epollEventLoopGroup-3-11 - 31,410,768 (1.42%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c808dff0 epollEventLoopGroup-3-4 - 30,982,336 (1.40%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c81be308 epollEventLoopGroup-3-5 - 30,751,936 (1.39%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c7ef0ef0 epollEventLoopGroup-3-12 - 30,093,296 (1.36%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c84370a8 epollEventLoopGroup-3-10 - 28,821,824 (1.30%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c7ee6d40 epollEventLoopGroup-3-13 - 28,559,312 (1.29%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c8239158 epollEventLoopGroup-3-2 - 28,383,352 (1.28%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c7f5b168 epollEventLoopGroup-3-8 - 28,157,184 (1.27%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c8021f68 epollEventLoopGroup-3-6 - 28,136,600 (1.27%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c80ceb10 epollEventLoopGroup-3-1 - 28,117,656 (1.27%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c8160f90 epollEventLoopGroup-3-3 - 27,727,816 (1.25%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c7f1e778 epollEventLoopGroup-3-14 - 26,839,032 (1.21%) bytes.

  • io.netty.util.concurrent.FastThreadLocalThread @ 0x6c8324368 epollEventLoopGroup-3-7 - 25,741,736 (1.16%) bytes.



Problem Suspect 3

167,233 instances of "io.netty.util.Recycler$WeakOrderQueue", loaded by "sun.misc.Launcher$AppClassLoader @ 0x6c621ecf8" occupy 312,969,784 (14.13%) bytes.



-histo:live


 num     #instances         #bytes  class name

----------------------------------------------


   1:        833363     1513088008  [Ljava.lang.Object;


   2:        722280      473815680  io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue


   3:        384506       49012000  [Lio.netty.util.Recycler$DefaultHandle;


   4:        230678       26402920  [C


   5:        666720       21335040  io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache


   6:        490524       15696768  io.netty.util.Recycler$DefaultHandle


   7:        340638       13625520  io.netty.util.Recycler$WeakOrderQueue


   8:        404527       12944864  io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry


   9:        365494       11695808  io.netty.util.Recycler$WeakOrderQueue$Link


  10:        341456       10926592  java.lang.ref.WeakReference


  11:         82884        9542232  [B


  12:         85442        7518896  io.netty.buffer.PooledUnsafeDirectByteBuf


  13:        174182        6967280  java.util.WeakHashMap$Entry


  14:        212647        5103528  java.lang.String


  15:         61547        4431384  net.sf.ehcache.Element


  16:          7825        4131336  [Lio.netty.handler.codec.http2.internal.hpack.HeaderField;


  17:         55560        3852160  [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache;


Eric Anderson

unread,
Oct 1, 2016, 4:02:08 PM10/1/16
to Taehyun Park, grpc.io
If you have high direct memory usage, typically it isn't a memory leak and instead is caused by excessive buffering due to the application sending too many RPCs concurrently (e.g., you send 1000 1MB RPCs simultaneously) or ignoring flow control (isReady/onReady when using streaming). Could you be causing too much buffering?

If you do think there is a memory leak of the direct buffers, you can enable Netty's leak detector. If it catching anything, please file an issue.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.
To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/6ed1619e-51ed-4003-ad82-2c0a78a5b6dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Taehyun Park

unread,
Oct 2, 2016, 1:11:44 PM10/2/16
to grpc.io, gold.d...@gmail.com
Thank you for your reply, Eric.

The memory usage kept growing gradually for 7 days and I had to restart the server application as it used 13G out of 16G ram. 

I'm not sure about ignoring flow control. Could you give me an example or document that I can take a look if I ignored flow control? Is there any way that I can check if the server is buffering too much? 128K is the maximum message size for this server. It usually have 500~800 channels and it never exceeded above.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.

Taehyun Park

unread,
Oct 6, 2016, 8:18:23 AM10/6/16
to grpc.io, gold.d...@gmail.com

2016-10-06 10:27:07,615 [ERROR] from i.n.u.ResourceLeakDetector in epollEventLoopGroup-3-12 - LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information.

I got this exception in logger so I will enable leakDetection to get more details.



On Sunday, October 2, 2016 at 5:02:08 AM UTC+9, Eric Anderson wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+u...@googlegroups.com.

cr...@livewirelabs.com.au

unread,
Oct 19, 2016, 3:56:17 AM10/19/16
to grpc.io

We have just encountered the same issue, been in production for about a week, exactly the same leak. We have just started investigating. Will let you know how we go.

Cheers
Craig

cr...@livewirelabs.com.au

unread,
Oct 19, 2016, 3:59:40 AM10/19/16
to grpc.io, cr...@livewirelabs.com.au
fyi

cr...@livewirelabs.com.au

unread,
Oct 19, 2016, 9:36:31 PM10/19/16
to grpc.io
Have worked out what the issue is. GRPC 1.0.0 and 1.0.1 are both based on Netty 4.1.3.Final. This version of Netty is affected by a memory leak bug referenced and fixed in the following commit:


The vanilla GRPC ServerBuilder uses a Executors.newCachedThreadPool() executor that expands and contracts based on the load. Any thread that is cleaned up leaks a whole lot of Netty cached data. The workaround is to specify your own Executor that reuses the threads and doesn't recycle them,.. something like:

Executors.newFixedThreadPool(serverConfig.getGrpcThreadPoolSize())

The current unreleased GRPC code looks like it has moved to Netty 4.1.6.FINAL, which has this issue fixed.

Cheers
Craig

On Wednesday, September 28, 2016 at 8:26:34 AM UTC+8, Taehyun Park wrote:

Taehyun Park

unread,
Oct 21, 2016, 4:39:50 AM10/21/16
to grpc.io, cr...@livewirelabs.com.au
Thank you for this information! I will use Netty 4.1.6.Final to resolve this issue.

Eric Anderson

unread,
Oct 31, 2016, 12:37:10 PM10/31/16
to Taehyun Park, grpc.io, cr...@livewirelabs.com.au
Craig filed issue 2358. Resolution will be discussed there.

--
You received this message because you are subscribed to the Google Groups "grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscribe@googlegroups.com.

To post to this group, send email to grp...@googlegroups.com.
Visit this group at https://groups.google.com/group/grpc-io.
Reply all
Reply to author
Forward
0 new messages