How to troubleshoot memory leak?

309 views
Skip to first unread message

libn...@gmail.com

unread,
Jul 29, 2015, 10:14:18 AM7/29/15
to golang-nuts
I have a program (actually it's Cloud Foundry doppler, and compiled with go 1.4.2) running on Ubuntu 12.04, and it's eating memory very fast: after running for 13 hours, it has occupied ~12G memory, as shown below. And finally, it'll be killed after eating up all memory.

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                     
12911 vcap      20   0 54.7g  12g 6300 R  109 80.6   1171:50 doppler 


I enabled the debug/pprof, and the got the MemStats as below:

# runtime.MemStats
# Alloc = 5790896
# TotalAlloc = 900122169088
# Sys = 19020024
# Lookups = 2918
# Mallocs = 10624350808
# Frees = 10624301349
# HeapAlloc = 5790896
# HeapSys = 10452992
# HeapIdle = 2695168
# HeapInuse = 7757824
# HeapReleased = 974848
# HeapObjects = 49459
# Stack = 5275648 / 5275648
# MSpan = 112112 / 147456
# MCache = 9600 / 16384
# BuckHashSys = 1737720
# NextGC = 6816480
# PauseNs = [2789191 3968340 2914668 2698380 2156193 2868260 4238862 2679187 3679682 2670721 3076884 2635763 3907943 4293306 4176519 3619091 3915664 3201470 3302607 4107046 2889968 3414808 2928959 2752206 3223438 3090283 4100852 3516592 2614265 4659160 4753029 4209772 5188335 5211843 3407856 4129701 6532486 3303685 5411276 3482420 3832991 3896763 3296063 3842593 3659289 5038146 3502151 2922496 3151741 3486885 2520300 5088027 2930584 6808167 3103872 3661226 3887416 3769384 2712990 2668005 4305291 7924831 4704789 4236156 2626356 3775363 3498518 3736749 3785753 3124459 4086828 3807204 4569470 3322812 3883025 3952671 2864986 2861727 3085843 2962661 3095698 3609767 4353487 3192113 3352075 2992260 2551665 2590794 2466847 2696792 2953302 3006632 3657759 3987252 2307301 3892239 3032933 2631966 5388308 3060818 3023478 3801224 3852953 5759576 4049063 3725653 3614695 3900773 3782371 4029822 2725097 2783054 4665304 3860404 2956914 2856477 3842424 3174915 3414377 5628262 4052199 3439270 2888891 2682672 2765040 2697610 3214515 3021781 3541053 3559635 3056579 5338241 3672181 4633665 3698984 3910529 3174523 4183345 2563621 3705034 2490728 2938635 2998140 3164119 4042022 2535621 3264872 5307125 3245727 3651874 3498360 5021546 2846740 2234997 3273143 3143630 3827433 3405533 3425180 3729446 3905734 6086010 3651008 4207423 3439688 2983425 2918028 3880283 2474659 3259304 3596237 3206893 2807185 3022692 5291911 2655451 3470372 6554980 4981446 2530514 2782124 4091653 3009388 3811089 3454175 4425638 3169496 2902751 4747865 3400250 3830256 2252663 4323584 3278660 4690746 2785606 2981928 3067261 4995243 2525487 2778814 3366327 3273042 3169190 3842609 4485684 4180762 3234563 2458692 4087225 5303966 3346635 3265682 3069302 3923853 3836920 3677528 3208429 3237576 3909200 4232523 3552255 3295393 4167230 2736900 2760396 2918083 2606986 2920763 2638301 2979999 2876863 7343961 3594270 5151323 2698290 3740779 3045720 2640011 3704440 2951753 2807708 4987305 3522284 6014753 3950313 3109647 2044458 3050710 4194351 3361451 4422128 4953414 3148980 2629482 4759608]
# NumGC = 288486
# EnableGC = true
# DebugGC = false

You can see that in the go's perspective, there is only 5790896 used, which is very small compared with 12G RSS usage. 

I wonder where the memory leaked. 

How could I do further troubleshooting? Thanks

James Bardin

unread,
Jul 29, 2015, 1:18:02 PM7/29/15
to golang-nuts, libn...@gmail.com

Not sure offhand, but this article covers any tips I'd provide, and more: https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs

Other clues to look for are allocations in C code (does this use any cgo?), and check for excessive threads which could indicate problems with cgo too.

Dave Cheney

unread,
Jul 30, 2015, 4:28:26 PM7/30/15
to golang-nuts
The number you want is HeapInUse which is 7g, plus the 5g from stacks makes 12g.

It is likely you have a large set of goroutines that had not exitied, holding up their stacks indefinitely.

You can get this data from the /debug/pprof endpoint.

Please let me know your results.

Thanks

Dave

libn...@gmail.com

unread,
Jul 31, 2015, 1:35:09 AM7/31/15
to golang-nuts, libn...@gmail.com, j.ba...@gmail.com
There is no C code, but it seems that this issue does relate with cgo, because I resolved the problem by adding CGOENABLED=0 when building.

libn...@gmail.com

unread,
Jul 31, 2015, 1:50:34 AM7/31/15
to golang-nuts, libn...@gmail.com, j.ba...@gmail.com
CGO_ENABLED=0 

Dave Cheney

unread,
Jul 31, 2015, 2:01:32 AM7/31/15
to golang-nuts, libn...@gmail.com, j.ba...@gmail.com
That suggests to me that it might be a problem with dns resolution. With cgo enabled (the default), then dns resolution will go via cgo to libc, consuming an operating system thread til the operation is complete.

The simplest way to debug this is to wait til the process bloats, then hit it with SIGQUIT and put the *entire* output into a gist or pastebin and post it so we can review.

Thanks

Dave

libn...@gmail.com

unread,
Jul 31, 2015, 3:55:07 AM7/31/15
to golang-nuts, libn...@gmail.com, j.ba...@gmail.com, da...@cheney.net
Here is the output, thanks.


Dave Cheney

unread,
Jul 31, 2015, 4:19:44 AM7/31/15
to golang-nuts
Thanks. how large was the process at that point?

libn...@gmail.com

unread,
Jul 31, 2015, 5:21:28 AM7/31/15
to golang-nuts, da...@cheney.net
As show below, memory usage is 38g. Thanks.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                    
 1686 root      20   0 41.9g  38g 5212 S   17 61.5  13026:38 metron   

Dave Cheney

unread,
Jul 31, 2015, 5:29:20 AM7/31/15
to libn...@gmail.com, golang-nuts
Thanks. It looks like you are not leaking goroutines as there are only a few dozen active at a that point.

I cannot explain why disabling cgo would have an impact on the size of your process. 

Matt Harden

unread,
Jul 31, 2015, 9:17:54 AM7/31/15
to Dave Cheney, libn...@gmail.com, golang-nuts
I believe the only things the standard library calls out to C for are DNS lookups in the net package and username lookups in os/user. It seems likely that one of these activities is leaking memory in C on your system. Do you have an unusual DNS resolver in place perhaps?

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Bardin

unread,
Jul 31, 2015, 10:05:37 AM7/31/15
to Matt Harden, Dave Cheney, libn...@gmail.com, golang-nuts
None of the extensive deps make use of cgo either, so I'm puzzled too. If you want to keep troubleshooting...

How many system threads does your process have when you kill it?

Looking at open file descriptors can sometimes be helpful. I see it uses fsnotify, which has been difficult to get right, so maybe something filesystem related?

You could also inspect the process memory map for more clues (/proc/$PID/maps, or `pmap -x $PID` for nicer output). Not sure what you might find that could help, but the fact we have allocated memory that Go isn't reporting is curious. 



--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/4XkyIWj2uy4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages