Tracking down RSS growth in cgo library

92 views
Skip to first unread message

Justin Israel

unread,
May 23, 2020, 1:20:58 AM5/23/20
to golang-nuts
I've been working to track this down for 2 days now and I'm just taking a long shot to see if anyone might have a new idea for me.
My cgo-based bindings library seems to have unbounded RSS memory growth, which I have been able to reduce to the smallest benchmark test and even pinpoint the exact call, but the reason behind it still eludes me.
The context is that I have a struct in C++ that will store a const char* for the last exception that was thrown, as a strdup() copy which gets cleaned up properly each time.
typedef struct _HandleContext {
    HandleId handle;
    const char* last_error;
} _HandleContext;

const char* getLastError(_HandleContext* ctx);
On the Go side, I have a function for lastError() to return the last error value
func (c *Config) lastError() error {
    err := C.getLastError(c.ptr)
    if err == nil {
        return nil
    }
    e := C.GoString(err)
    if e == "" {
        return nil
    }
    runtime.KeepAlive(c)
    // return nil  // <- would result in no memory growth
    return errors.New(e)  // <- results in memory growth
}
What I am seeing in my benchmark test is that the RSS grows something like 20MB a second, yet the GODEBUG=gctrace=1 and the pprof memory profile don't reflect this memory usage at all, aside from showing a hotspot (in pprof) being the GoString() call:
gc 4 @4.039s 0%: 0.006+0.14+0.003 ms clock, 0.024+0.10/0.039/0.070+0.014 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
gc 5 @6.857s 0%: 0.003+0.20+0.004 ms clock, 0.015+0.069/0.025/0.15+0.016 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
...
gc 26 @69.498s 0%: 0.005+0.12+0.003 ms clock, 0.021+0.10/0.044/0.093+0.014 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
// 800MB RSS usage
gc 27 @71.824s 0%: 0.006+2.2+0.003 ms clock, 0.025+0.063/0.058/0.11+0.014 ms cpu, 4->4->0 MB, 5 MB goal, 4 P
(pprof) top10
Showing nodes accounting for 46083.69kB, 100% of 46083.69kB total
Showing top 10 nodes out of 19
      flat  flat%   sum%        cum   cum%
30722.34kB 66.67% 66.67% 30722.34kB 66.67%  <...>._Cfunc_GoStringN (inline)
 7168.11kB 15.55% 82.22%  7168.11kB 15.55%  errors.New (inline)
 3073.16kB  6.67% 88.89% 46083.69kB   100%  <...>.testLeak
 1536.02kB  3.33% 92.22%  1536.02kB  3.33%  <...>.(*DisplayTransform).SetInputColorSpace.func1
 1024.02kB  2.22% 94.44%  1024.02kB  2.22%  <...>.(*Config).NumViews.func1
 1024.02kB  2.22% 96.67%  1024.02kB  2.22%  <...>.(*Config).View.func1
  512.01kB  1.11% 97.78%   512.01kB  1.11%  <...>.(*DisplayTransform).SetView.func1
  512.01kB  1.11% 98.89%   512.01kB  1.11%  <...>._Cfunc_GoString (inline)
  512.01kB  1.11%   100%   512.01kB  1.11%  <...>.newProcessor (inline)
         0     0%   100%   512.01kB  1.11%  <...>.(*Config).ColorSpaceNameByIndex
Regardless of whether I ignore the error return value in my test, it grows. If I return nil instead of errors.New(e), it will stay around 20MB RSS.

I MUST be doing something stupid, but I can't see any reason for the memory growth based on returning this string wrapped in an error. At first I thought I was leaking in C/C++ but it a led to this one factor on the Go side. Any tips would help greatly, since I have tried debug GC output, pprof reports, valgrind, address sanitizer, and refactoring the entire memory management of my C bindings layer.

Justin


Justin Israel

unread,
May 23, 2020, 11:39:24 PM5/23/20
to golang-nuts
Well I seem to have resolved the leak, which was due to a poor assumption on my part about the frequency of finalizer execution and being tied to a GC cycle. Are finalizers executed on every GC cycle, or are they maybe executed on a GC cycle but not sooner than? 
My library creates finalizers to ensure C memory is freed at some point, but I also have Destroy() methods to immediately free them (and clear finalizers). So I think it was a combination of the test not generating enough Go garbage to clean up the more significant C memory, and not being explicit enough with Destroy calls. I look to have also had a situation where I wasn't cleaning up C strings as fast as I could have been, so that also helps to clear them more quickly before a Destroy or a finalizer runs.

As much as I thought I knew about the caveats of finalizers and using them to release C resources, I still likely made faulty assumptions.
 
 

Ian Lance Taylor

unread,
May 24, 2020, 3:13:27 PM5/24/20
to Justin Israel, golang-nuts
Finalizers are strictly best effort.

Also, if a Go memory object has a finalizer that frees a C memory
object, it's important to note that the Go garbage collector will be
basing decisions only on the amount of memory used by the Go object,
and will not consider the amount of memory used by a C object. It's
easy for small Go objects to be associated with large C objects, and
for the Go garbage collector to decide that there is no need to do
anything because not much Go memory is in use, even while an enormous
amount of C memory is being kept alive.

Ian

Justin Israel

unread,
May 24, 2020, 3:51:24 PM5/24/20
to Ian Lance Taylor, golang-nuts
Yea I should have known better. 
About the frequency of finalizers, can you confirm my question about whether the finalizers run during the GC run each time, regardless of when that GC run decides to happen? Or can a GC start to run but decide not to process the finalizers yet in that cycle? 

Ian Lance Taylor

unread,
May 25, 2020, 4:39:55 PM5/25/20
to Justin Israel, golang-nuts
On Sun, May 24, 2020 at 12:50 PM Justin Israel <justin...@gmail.com> wrote:
>
> About the frequency of finalizers, can you confirm my question about whether the finalizers run during the GC run each time, regardless of when that GC run decides to happen? Or can a GC start to run but decide not to process the finalizers yet in that cycle?

The question doesn't have a simple answer. The garbage collector runs
concurrently with the program. The exact details aren't too important
(and I'm not sure I fully understand them) but the collector will
periodically identify blocks of memory that have been fully marked,
such that any unmarked object in that block is definitely free. If
one of those unmarked objects has a finalizer, the collector will mark
the object so that it will not be freed after all, will remove the
finalizer from the object, and will queue the finalizer to be run.
The finalizer queue will hold a reference to the object, so it won't
be freed while the finalizer is queued. There is a separate goroutine
that runs finalizers, and queuing a finalizer wakes up that goroutine.
When that goroutine is scheduled, it will get the finalizer off the
queue and run it.

I'm not sure how to take that process and turn it into an answer to
your question.

Ian
Reply all
Reply to author
Forward
0 new messages