memory leak, heap size doesn't match process res size

2,013 views
Skip to first unread message

Kane Kim

unread,
Feb 13, 2018, 1:55:53 PM2/13/18
to golang-nuts
Process show up using 11GB of resident memory, it doesn't use cgo, built with go 1.9.
Heap debug shows:
# runtime.MemStats
# Alloc = 535278016
# TotalAlloc = 113090364120
# Sys = 583994616
# Lookups = 221159
# Mallocs = 499797278
# Frees = 492733492
# HeapAlloc = 535278016
# HeapSys = 543293440
# HeapIdle = 3776512
# HeapInuse = 539516928
# HeapReleased = 2899968
# HeapObjects = 7063786
# Stack = 5111808 / 5111808
# MSpan = 9703832 / 9797632
# MCache = 3472 / 16384
# BuckHashSys = 2118509
# GCSys = 21561344
# OtherSys = 2095499
# NextGC = 1000557040
# LastGC = 1518547848551628001

Both memory profile and heap debug show ~500MB memory is in use, idle heap is pretty low, given all that and lack of cgo, where rest 10.5G could go and what are tools to debug it?

Tamás Gulácsi

unread,
Feb 13, 2018, 3:18:36 PM2/13/18
to golang-nuts
TotalAlloc - TotalReleased = Inuse
TotalAlloc is the total allocated bytes in the lifetime of the process. Every allocation is counted.

Kane Kim

unread,
Feb 13, 2018, 4:51:14 PM2/13/18
to golang-nuts
Main thing is that memory doesn't show up in heap profile, but still occupies resident memory.

Peter Waller

unread,
Feb 14, 2018, 6:05:26 AM2/14/18
to Kane Kim, golang-nuts
First, a sanity check: How are you measuring the resident set? Are you certain the memstats data you presented are from the same run (and after) you hit 10G resident?

What you have described doesn't immediately fit in my mental model. Could you be measuring the VM size rather than the resident size? The thing that puzzles me a little is that your "Sys =" field is so much lower than 10G, assuming the measurements were taken at the same time.

Have you tried calling runtime.FreeOSMemory and playing with GOGC (described in pkg/runtime docs), do those make any difference? You might also want to play with GODEBUG=gctrace=1.

Kane Kim

unread,
Feb 14, 2018, 11:15:15 AM2/14/18
to golang-nuts
I've measured resident size multiple times, with top, /proc/pid/smem, pmap. There is a huge allocated region towards the end of virtual memory:
7fba77bb0000-7fbc3bf37000 rw-p 00000000 00:00 0
Size:            7409180 kB
Rss:             7408840 kB
Pss:             7408840 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:   7408840 kB
Referenced:      7179916 kB
Anonymous:       7408840 kB
AnonHugePages:    243712 kB

I've tried to force GC through /pprof/debug/heap?gc=1, didn't have any effect (gc was called, but nothing freed).
This doesn't fit to my understanding as well. If we don't use CGO all memory should be reported somewhere?

Peter Waller

unread,
Feb 15, 2018, 5:14:55 AM2/15/18
to Kane Kim, golang-nuts
On 14 February 2018 at 16:15, Kane Kim <kane....@gmail.com> wrote:
If we don't use CGO all memory should be reported somewhere?

Well, assuming no part of your software calls mmap, or there isn't something else funny going on.

Can you capture when this large region is mapped with strace? At what point in the process lifecycle does this happen? Perhaps you can use a debugger (dlv) to catch the culprit in the act. Perhaps you could come up with a mechanism to unmap the region and get a segfault and stacktrace when the culprit tries to use it.

If you find out what's going on please share if you can, it sounds like an interesting issue.

Kane Kim

unread,
Feb 15, 2018, 10:16:42 AM2/15/18
to golang-nuts
I think I've got a clue. I've discovered that the process under question was built with -race flag. Apparently race detector structures (tsan) are not visible to runtime and are not reported.

Kane Kim

unread,
Feb 15, 2018, 10:21:12 AM2/15/18
to golang-nuts
I was surprised to see that overhead is so significant, is there any way to peek into that and see what is taking up space in race detector? I've tried --inuse_objects in a heap dump, but didn't see anything suspicious there.

Ian Lance Taylor

unread,
Feb 15, 2018, 11:01:29 AM2/15/18
to Kane Kim, golang-nuts
On Thu, Feb 15, 2018 at 7:21 AM, Kane Kim <kane....@gmail.com> wrote:
>
> I was surprised to see that overhead is so significant, is there any way to
> peek into that and see what is taking up space in race detector? I've tried
> --inuse_objects in a heap dump, but didn't see anything suspicious there.

The race detector works by using shadow memory to track the status of
the real memory. So in effect it multiplies the total amount of
possible memory usage by a constant multiplier (though I don't know
what that multiplier is offhand). This shadow memory is not part of
the Go heap so the Go heap inspection tools will not help.

Ian


> On Thursday, February 15, 2018 at 7:16:42 AM UTC-8, Kane Kim wrote:
>>
>> I think I've got a clue. I've discovered that the process under question
>> was built with -race flag. Apparently race detector structures (tsan) are
>> not visible to runtime and are not reported.
>>
>> On Thursday, February 15, 2018 at 2:14:55 AM UTC-8, Peter Waller wrote:
>>>
>>> On 14 February 2018 at 16:15, Kane Kim <kane....@gmail.com> wrote:
>>>>
>>>> If we don't use CGO all memory should be reported somewhere?
>>>
>>>
>>> Well, assuming no part of your software calls mmap, or there isn't
>>> something else funny going on.
>>>
>>> Can you capture when this large region is mapped with strace? At what
>>> point in the process lifecycle does this happen? Perhaps you can use a
>>> debugger (dlv) to catch the culprit in the act. Perhaps you could come up
>>> with a mechanism to unmap the region and get a segfault and stacktrace when
>>> the culprit tries to use it.
>>>
>>> If you find out what's going on please share if you can, it sounds like
>>> an interesting issue.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

azha...@gmail.com

unread,
Apr 4, 2020, 4:29:46 PM4/4/20
to golang-nuts
Have you figure the root cause?  We experience the same issue
Reply all
Reply to author
Forward
0 new messages