Unable to find small memory leak

385 views
Skip to first unread message

Miles Hex

unread,
Feb 9, 2021, 2:05:23 PM2/9/21
to golang-nuts

Hi,

I'm using golang (1.15.6) to write a daemon that handle system task(network, time, updates, etc..)  in embedded device (an onion omega 2+), and i'm encountering a small memory leak that i'm unable to identify and fix.
The device is using linux 4.14.171, the architecture is mips.

At first i used "go tool pprof" and the memory profile but the "inuse" index is always empty, and that kinda match my expectation since the application spend most of his time idle awaiting commands/event (i must add that he leak occur even when the application is idle). I also checked the number of goroutine that remains constant.

I then added some logs to try to understand when this allocation occurs (and the subsequent leak), but i'm unable to make sense of the data.
I also collected a couple of heapdump that i'm trying to analyze with
"github.com/temorfeouz/heapdump14" (which is a fork of https://github.com/randall77/heapdump14, and the only tool i find that can open heapdump of go 1.15)

Following is the (partial) memstat data:

MemStast Before:
  "HeapAlloc": 1811448,
  "HeapSys": 7176192,
  "HeapIdle": 4399104,
  "HeapInuse": 2777088,
  "HeapReleased": 3358720,
  "HeapObjects": 11200,
  "StackInuse": 1212416,
  "StackSys": 1212416,

MemStat After 7 days:
  "HeapAlloc": 2033048,
  "HeapSys": 11403264,
  "HeapIdle": 8257536,
  "HeapInuse": 3145728,
  "HeapReleased": 8257536,
  "HeapObjects": 13060,
  "StackInuse": 1179648,
  "StackSys": 1179648,

I also have the summary from the two heapdump
Before:
Heap size: 8.0 MB bytes
Heap live: 1.8 MB bytes
Heap objects: 11496
Reachable objects: 10492
Reachable size: 1.6 MB bytes

After 7 days:
Heap size: 12.0 MB bytes
Heap live: 2.0 MB bytes
Heap objects: 13198
Reachable objects: 10454
Reachable size: 1.6 MB bytes

I have collected a lot more data, including all the profile made available from the go runtime and various system information.

If i'm reading this data correctly there is a memory leak as the heap si growing but i'm not sure how to find it! I tough i could try to diff the two heapdump with excel but the tool that i'm using is not very "precise" and a lot of type information seem to be lost.

I don't know how to make sense of this data, what would your next step? I'm quite new in the golang ecosystem and i'm unsure how to move from here.

Thanks.





Uli Kunitz

unread,
Feb 9, 2021, 2:39:27 PM2/9/21
to golang-nuts
GODEBUG=allocfreetrace may be what you are looking for. You might want to check the output with sort | uniq -c.

Uli Kunitz

unread,
Feb 9, 2021, 2:57:36 PM2/9/21
to golang-nuts
It is GODEBUG=allocfreetrace=1 . Documentation for GODEBUG can be found here: https://pkg.go.dev/runtime

Miles Hex

unread,
Feb 9, 2021, 4:43:42 PM2/9/21
to golang-nuts
Thank you, tomorrow i will try it!
Do you know if it possible to enable it without changing the enviroment variable? My application is started by another one and i can't (easily) change the enviroment variables.

Uli Kunitz

unread,
Feb 9, 2021, 6:39:13 PM2/9/21
to golang-nuts
I looked at your data again. Are you sure that you have a memory leak? According to the heap dump you have less reachable objects after 7 days than before. The reachable object size is in both cases 1.6 MByte. The live heap is a little bit larger than seven days before, but is probably caused by fragmentation and appears to me acceptable. The overall heap size increased by 4 MB, but it means that you probably need that size to handle requests. 

You can check that by enabling GODEBUG=gctrace=1. Your program has only a memory leak if the live heap size increases continuously. BTW you can set the environment variable and start the program that calls your program. You might keep your heap size a little bit smaller by setting GOGC to a smaller value than 100.  Here is a helpful article: https://dave.cheney.net/tag/godebug

Miles Hex

unread,
Feb 11, 2021, 3:01:52 AM2/11/21
to golang-nuts
To be honest, i'm not totally sure, what is see is that memory keep raising until the device crash for out of memory, but what does it mean if reachable object remain constant, but the " Heap objects" keep raising? Even when i manualy trigger the GC!
Also the reachable object is not exported from runtime.ReadMemStats(), and i'm not sure that the stats from the heapview software are correct.
Thankyou for your help

Uli Kunitz

unread,
Feb 11, 2021, 6:39:55 AM2/11/21
to golang-nuts
You are writing: The device crashes with out of memory. What does crash? The Go program, another program or the kernel? Can you share the error message? Can you share the log file for one day of GODEBUG=gctrace=1?

In bash do:

$ export GODEBUG=gctrace=1
$ <your-program-that-calls-the-Go-program> 2>error.log

Tom Mitchell

unread,
Feb 11, 2021, 12:11:48 PM2/11/21
to Uli Kunitz, golang-nuts
On Thu, Feb 11, 2021 at 3:40 AM Uli Kunitz <uli.k...@gmail.com> wrote:
You are writing: The device crashes with out of memory. What does crash? The Go program, another program or the kernel? Can you share the error message? Can you share the log file for one day of GODEBUG=gctrace=1?

In bash do:

$ export GODEBUG=gctrace=1
$ <your-program-that-calls-the-Go-program> 2>error.log

On Thursday, February 11, 2021 at 9:01:52 AM UTC+1 Miles Hex wrote:
To be honest, i'm not totally sure, what is see is that memory keep raising until the device crash for out of memory, but what does it mean if reachable object remain constant, but the " Heap objects" keep raising?

Objects plural?
What is the size of each heap object.  With garbage collection it can help to have as few sizes as possible. Pad objects so when collected the free leaves a useful size space in the heap.   Linux can allow the VM system to over allocate resources and only discover when written and the page fault has no backing store.  Add swap to file and or tune the VM system.   Limits -- set limits low enough for the program so it cannot dominate system physical resources. 
Add signal handler code to dump a summary of all your live objects.  Recursion? Socket(state)? Events arrive faster than event handling?  Locks/Mutex: is housekeeping getting CPU time [phone booth rule, give priority to leaving]?  


--
          T o m    M i t c h e l l ( o n   N i f t y E g g )

Robert Engels

unread,
Feb 11, 2021, 12:27:00 PM2/11/21
to Tom Mitchell, Uli Kunitz, golang-nuts
Are you using CGo or libraries that use CGo? If so, you probably have an off-heap, i.e malloc() leak  

On Feb 11, 2021, at 11:11 AM, Tom Mitchell <mi...@niftyegg.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAAMy4UQWV1Gt9UZBpjpAx01mhL5HEr8NCtGX1rnr1Hevui1Eiw%40mail.gmail.com.

Miles Hex

unread,
Feb 21, 2021, 9:58:11 AM2/21/21
to golang-nuts
Hi, thanks to everyone for the suggestions,

It seems that there is in fact not a memory leak (still not totally sure), but for some reason, the go runtime keeps allocating memory event when is not needed, I think the problem is in the interaction from the memory model of the go runtime with a Linux system with low ram (128MB).

If I force a low memory situation (by allocating memory in another program), the go runtime give back most of his memory (goes from 23 MB to 6MB),  but before that happen the system become very unstable at the point that I'm unable to spawn new process (related to issue #31936), and other services start to crash(not yet sure why) or the OOMkiller start killing process. go 1.16, should "fix" this, with how it releases memory back to the system (not yet tested).

I'm still a bit perplexed by the numbers from runtime. memstat,  I don't help that my code does a lot of small, short-lived allocations, which makes analysis a lot harder, I will now enable memstat metric even in production hoping that will help, and working on reducing allocation in some hot path, to make the output less busy.

@Tom Mitchell: yes objects plural, sorry. i will try your suggestions.

a couple of notes:
- I don't use CGO
- I, unfortunately, can't change the environment variable, so I can use "GODEBUG=gctrace=1" only on my machine which runs/behave differently in virtue of not having the same hardware, and does not present the same memory problems/behavior, I also can't launch myself with the new variable(for a bunch of reasons that are a bit hard to explain, without going in many details), I have a couple of idea on how to fix it but I wasn't able to,
- On this device run other services, that for me are black boxes, some of which seem to crash in low memory situation but I don't know yet why (i emailed the respective developers and awaiting a response)

I will update, if I find something.
Again thanks to everyone.
Reply all
Reply to author
Forward
0 new messages