pprof's heap reports 964MB heap when HeapAlloc is 2GB

414 views
Skip to first unread message

schmichael

unread,
Feb 9, 2015, 6:03:06 PM2/9/15
to golan...@googlegroups.com
I'm trying to diagnose a memory leak and am having a hard time reconciling memory usage numbers from go's various tooling.

"go tool http://localhost:4210/debug/pprof/heap" reports on 78591MB of 964.49MB total.

MemStats reports Alloc/HeapAlloc are ~2GB.

Where does the pprof heap total come from? Why the large discrepancy?

I'm using a single clib via cgo which I doubt is the issue as it's been in use in this system since long before this leak appeared.

go version go1.4.1 linux/amd64

Full memstats:

{
  "Alloc": 2078897216,
  "TotalAlloc": 19402061080824,
  "Sys": 4351501256,
  "Lookups": 10146574883,
  "Mallocs": 316945514953,
  "Frees": 316927181816,
  "HeapAlloc": 2078897216,
  "HeapSys": 3721003008,
  "HeapIdle": 1054179328,
  "HeapInuse": 2666823680,
  "HeapReleased": 30236672,
  "HeapObjects": 18333137,
  "StackInuse": 319954944,
  "StackSys": 321388544,
  "MSpanInuse": 39870336,
  "MSpanSys": 40779776,
  "MCacheInuse": 8800,
  "MCacheSys": 16384,
  "BuckHashSys": 6988072,
  "GCSys": 247246848,
  "OtherSys": 14078624,
  "NextGC": 2539201696,
  "LastGC": 1423522601051279000,
  "PauseTotalNs": 4292761263230,
  "NumGC": 47192,
  "EnableGC": true,
  "DebugGC": false
}

Stephen Gutekanst

unread,
Feb 9, 2015, 6:42:14 PM2/9/15
to golan...@googlegroups.com
Quoting the docs for runtime.MemStats:
        // General statistics.
        Alloc      uint64 // bytes allocated and still in use
        TotalAlloc uint64 // bytes allocated (even if freed)
        Sys        uint64 // bytes obtained from system (sum of XxxSys below)
        Lookups    uint64 // number of pointer lookups
        Mallocs    uint64 // number of mallocs
        Frees      uint64 // number of frees

Thus, in your case:
  • 2 GB allocated and still in use.
  • 19402 GB (?!) allocated (even if freed).
  • 4.35 GB obtained from system.
It also claims that you had a last GC pause of 71 minutes:

 "PauseTotalNs": 4292761263230,

If you run the app with GODEBUG=gctrace=1 -- do you really experience a 71-minute GC pause? If not, I'd point my finger towards potential memory corruption.

Cheers,
Stephen

Stephen Gutekanst

unread,
Feb 9, 2015, 6:44:41 PM2/9/15
to golan...@googlegroups.com
Also, note that pprof/MemStats does not report memory allocated outside the Go runtime (i.e. CGO).

What does e.g. `top` show ?

schmichael

unread,
Feb 9, 2015, 7:09:40 PM2/9/15
to golan...@googlegroups.com
The system nearly matches Sys:

Sys: 5018 MB
RSS 5037 MB
VSZ: 5868 MB

I'll be looking into the GC next! This is a distributed work system where other nodes will claim work if the current owner doens't heartbeat within a deadline.... and sure enough the node I'm inspecting lost all of its work due to a missed deadline.

schmichael

unread,
Feb 9, 2015, 7:14:44 PM2/9/15
to golan...@googlegroups.com
Quick clarification: The total GC pause time is 71 minutes, not the last time.

That being said it does appear this process is producing excessive garbage.

Stephen Gutekanst

unread,
Feb 9, 2015, 7:26:15 PM2/9/15
to schmichael, golang-nuts
Accidentally replied off-list:

Well, the fact that the system reports the same as go does is a good indicator that the problem is a 'leak' in your go code and not with your cgo lib. Otherwise your sys would report much higher than go/pprof.

I still wouldn't rule out corruption by your cgo though. Is it's source available on the web so I can view it?


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/m-djT4LotdM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Follow me on twitter @slimsag.

schmichael

unread,
Feb 9, 2015, 7:45:02 PM2/9/15
to golan...@googlegroups.com, michael....@gmail.com
This is the geoip library we use: https://github.com/abh/geoip

This is the distributed work library we use: https://github.com/lytics/metafora

Sadly 99.9% of the code - and where I suspect the leak lies if not in cgo - is proprietary.

Stephen Gutekanst

unread,
Feb 9, 2015, 8:11:42 PM2/9/15
to schmichael, golang-nuts
Michael,

I took a light look / overview of the geoip bindings -- nothing obvious sticks out as far as misuse of CGO. It looks very straight-forward and they appear to alloc/free appropriately.

You might also consider looking at the goroutine profile -- how many goroutines are running? Maybe they are being spawned off at a fast rate and never exiting (hence you'd leak their stack + in-scope vars).

pprof or running with `GODEBUG=schedtrace=1000` should give you a good idea of the running goroutines.

You could also hit the process with `kill -SIGQUIT pid` and look at the stack trace to see if it's doing anything you think it shouldn't be.

Without more information -- I'm just taking blind guesses =)

Hope this helps,
Stephen

schmichael

unread,
Feb 9, 2015, 8:50:51 PM2/9/15
to golan...@googlegroups.com, michael....@gmail.com
Thanks for taking a look! I'm off for the day, but I think gourtines arethe main line of inquiry tomorrow. Our goroutine dumps (from net/http/pprof's goroutine?debug=2 output) can reach 11MB of text.

Dave Cheney

unread,
Feb 9, 2015, 10:28:52 PM2/9/15
to golan...@googlegroups.com
Can you post the contents of that dump? We had another bug report recently where this information was crucial to discovering the underlying cause.

Dmitry Vyukov

unread,
Feb 10, 2015, 3:28:54 AM2/10/15
to schmichael, golang-nuts
This was discussed several times.
In short, Sys-HeapReleased is total memory consumption
HeapSys-HeapReleased is memory consumption by heap.
Pprof number is smaller due to several reasons. But that is not
important if you are just trying to find leaks.
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
Reply all
Reply to author
Forward
0 new messages