surprisingly high (persistent) RSS usage after a spike in memory; questions about allocator debugging

75 views
Skip to first unread message

thomas....@googlemail.com

unread,
Jun 12, 2020, 4:54:47 PM6/12/20
to golang-nuts
Hey everybody,

I am looking at a Go process where the total number of bytes managed by Go's heap in the long run is around 20-30 megs, but the RssAnon reported in /proc/self/status is extremely high (800+ megs). I am trying to figure out what is going on, and have a few questions about the proper way to debug such situations.

Background:
The process does a fair number of allocations & computations on startup, and after it finishes, settles into a quiet life of sporadic allocation and occasionally reacting to events. This means there is a pronounced spike of memory consumption at the beginning, followed by a long period (months) of low memory consumption. Unfortunately, what I am seeing is a failure of the garbage collector to release memory back to the OS, even after hours of waiting.
I am periodically logging Alloc & Sys from the runtime's memory statistics, along with RssAnon and RssFile from /proc/self/status, and RssAnon spikes up to 800+ megs and never goes down thereafter. I have plotted the evolution of these values over time here: https://twitter.com/halvarflake/status/1271538641878290432/photo/1

My theory is that we are somehow hitting a pessimal situation where we leave just enough allocations alive to ensure the garbage collector cannot release anything. To confirm this hunch, there are two things I'd like to do, but I am quite unclear how to achieve them:

1. If I could get a list of all live objects, I should be able to see how "scattered" they are through memory, and how this leads to a failure to release memory. I added some code to call debug.WriteHeapDump() at regular intervals, and also when our go-reported memory usage spikes. Once I had done so, I tried to find a library or tool to parse the heap dumps, and failed to find one - is there anything out there that I can use, or does that still need to be written?

2. To analyze heap fragmentation and heap layout issues in C/C++, I have lightweight infrastructure that logs *all* allocations, sizes, and deallocations in a compact binary format into a buffer; when that buffer is full, the process forks and the child writes the data. I then have tooling to draw (large) diagrams from this where the x-axis is time and the y-axis address space, and free/live allocations are drawn as rectangles. An example of such a diagram is here: https://twitter.com/halvarflake/status/1075156510555168769/photo/1
I have found such diagrams diagrams to be immensely helpful when diagnosing allocation pathologies and interacting with complex heap layouts, and would love to gather similar data for my Go processes. The easiest for me would be if I had the ability to add a call to my (C-based) shared library into mallocgc; since that code eats almost no stack and is very much under my control, this should (at least theoretically) be doable without all the bells and whistles of an FFI. Is there a (not-production-safe, hackish etc.) way of doing that that is not quite as bad as patching a hook into the binary? Or is there even a way to get a callback from mallocgc and the free'ing functions to build such logging provided one does not perform any heap operations in the callback?

Does anyone have any advice?

Cheers,
Thomas

Uli Kunitz

unread,
Jun 12, 2020, 6:20:07 PM6/12/20
to golang-nuts
This looks like that your program is allocating a lot of memory once. Note that even if Go has informed the OS with MADV_FREE that memory can be reclaimed, the OS will only reclaim, when there is actual memory pressure.

I suggest to read the document of GODEBUG in https://golang.org/pkg/runtime/ . The parameter scavtrace might be interesting for you, because it informs the amount of RAM returned to the OS.

Reply all
Reply to author
Forward
0 new messages