Service using much more system memory than runtime.MemStats is showing

768 views
Skip to first unread message

Blake Caldwell

unread,
Aug 26, 2014, 9:02:34 AM8/26/14
to golan...@googlegroups.com
I'm having a hard time understanding why my service is using up 400MB-1GB of system memory, as reported by top on Linux, and Activity Monitor on OS X. I figured I'm leaking goroutines or my 32kb buffers, so I tried:
- reusing memory buffers (no change)
- monitoring goroutines and heap with net/http/pprof listening on 6060 - no leaked goroutines and memory was as expected, (matching my graphs below)
- exiting with panic to view the lingering goroutines - none
- runtime.MemStats graphing (see below)

I used the info from this post to graph my memory usage, and found no misbehavior:


Here's a the service after a short time:


And here it is after running all night:

This is exactly as I'd expect - about 40MB of memory being used. The sudden drop-offs are the GC doing its job - I verified this by playing with GOGC. 

However... When I use top (Linux and OSX) or Activity Monitor (OSX), the service quickly ramps up to 200MB, then slowly raised to over a GB when running over night. It looks like it then dropped down to 400MB, and started climbing again. 

This will be a long-running service, so I need to understand the discrepancy between expected and actual system memory usage. I'm clearly missing something. Any insight?

Carlos Castillo

unread,
Aug 26, 2014, 4:10:09 PM8/26/14
to golan...@googlegroups.com
It's good that your graphs are using the Sys value, since that is probably the most accurate value for the amount of memory go has received from the OS.

I'm not sure if you are aware, but the go runtime will never free memory back to the OS, neither the GC nor the Scavenger will do so in a way that is reflected in most OS statistics. This just means the memory use as viewed from outside the program will never go down on it's own for go-managed memory (go does make it easier for the OS to reclaim memory from you program by marking it unused.)

That being said, from your description it's likely that short a bug in the statistics you are using memory that is not tracked by the go runtime. If you use CGO, any memory allocated by C/C++ code will not be tracked by go memory statistics. For example, if you are using gopkg.in/qml.v1, all memory used by the UI, including for example any decoded images sent to QML will be stored in memory not tracked by go. If you use a database with a driver that is not pure-go (eg: most in-process databases like sqlite) it's memory use for caching and storing values is not tracked.

Similarly, if you use directly (or through packages) syscalls that allocate memory, they won't be tracked either. For example, under some usage conditions mmap can allocate memory that counts against the process (although AFAIK it's default operation of memory mapping files usually doesn't).

Carlos Castillo

unread,
Aug 26, 2014, 4:36:35 PM8/26/14
to golan...@googlegroups.com
Also there are two reasons why the techniques you are applying don't seem to have any effect on max memory use:
  1. If the go statistics are correct, they can and will only affect the 40MB of memory that the go code itself is using, eg: 1/5th of the total memory use reported by the system.
  2. Memory use never goes down, and the GC will try to keep it within a limit of 2x the greatest amount of memory actually in use at any given time.
For #2, essentially when the GC runs, it finds all the memory in use at that time, and sets a limit of 2x that to be the threshold for the next time the garbage collector will be run. This means that even if you re-use buffers, and perform other memory tricks, you generally still have the same amount of minimum memory your code needs to be able to proceed. In this case, that is likely around 20MB, and for long running apps, the greater chance a GC will be performed when all 20MB was active. At that point the memory limit is set to double that (40MB), and although the limit may drop later, your program will need to use all the memory up to that limit before the GC is called again (to reset the limit). Since memory is never returned by go (as I mentioned before), that means that all 40Mb will be acquired at some point.

Using those techniques are often still good though, the more you re-use memory without needing more the less the GC needs to run.

One last thing, there is a tool called gcvis (https://github.com/davecheney/gcvis) that allows you to see some of the go memory statistics in graph form without altering your code in any way. It's not as detailed as the tracking you've done, but is useful for a quick diagnosis of new code.

Blake Caldwell

unread,
Aug 26, 2014, 4:43:08 PM8/26/14
to golan...@googlegroups.com
Thanks so much. I hadn't considered memory allocated outside of Go. I'm writing an SSH proxy, using go.crypto/ssh. I'm guessing that's not all in Go, and that I'm using something incorrectly.

Carlos Castillo

unread,
Aug 26, 2014, 5:30:33 PM8/26/14
to Blake Caldwell, golan...@googlegroups.com
The easiest way of finding out what your code is using (to see syscall and cgo use) is to use go list.

Eg: "go list -json" will quickly show you which packages a package imports itself (shown as Imports) and transitively (Deps).

Also if the documentation for a package is visible through godoc.org then at the bottom is a link to see a graph of all imported packages for the package in question. 

This way you should be able to narrow down the possible places where the C (ie: cgo) and syscall packages are used. Note that the standard library uses syscall a lot from the os package, and cgo in the net and crypto packages albeit infrequently. Although both cases shouldn't affect your memory in any unexpected ways. 
--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/xsgkSALra-g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Carlos Castillo

Dave Cheney

unread,
Aug 26, 2014, 11:16:07 PM8/26/14
to golan...@googlegroups.com
go.crypto/ssh is 100% written in Go.

Are you setting GOMAXPROCS > 1 by any chance?

Blake Caldwell

unread,
Aug 26, 2014, 11:32:37 PM8/26/14
to golan...@googlegroups.com
Nope, using the default for GOMAXPROCS

How about net/http? Also 100% Go?

Dave Cheney

unread,
Aug 26, 2014, 11:40:37 PM8/26/14
to Blake Caldwell, golang-nuts
Yes, with the exception of dns resolution which uses cgo to call your
system name resolver.

I suspect your program has a lot of native threads running, each
consumes several megabytes (the number is highly os dependant).

As an experiment try building a version of Go with cgo disabled, this
will revert to the pure go dns resolver,

export CGO_ENABLED=0 && ./all.bash

Blake Caldwell

unread,
Aug 27, 2014, 12:29:25 AM8/27/14
to golan...@googlegroups.com, bl...@fogcreek.com
I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

So... looking at my active goroutines before and after this first SSH connection proxy, I *do* see something interesting, but I haven't the time tonight to dive into the call stacks. These two goroutines weren't there before my first SSH proxy, but they stick around after:

I'm a little confused why I'm seeing net/http here. I *do* make an Http request for authorization of the public key during the SSH handshake, but I'm pretty sure that's not the issue. I'm also using pprof and hosting the server on 6060 as I mentioned above. Those should be the only explicit uses of http on my end.

Any ideas? I know it's tough without sharing my code :(

Dave Cheney

unread,
Aug 27, 2014, 12:38:01 AM8/27/14
to Blake Caldwell, golang-nuts
On Wed, Aug 27, 2014 at 2:29 PM, Blake Caldwell <bl...@fogcreek.com> wrote:
I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

So... looking at my active goroutines before and after this first SSH connection proxy, I *do* see something interesting, but I haven't the time tonight to dive into the call stacks. These two goroutines weren't there before my first SSH proxy, but they stick around after:

I'm a little confused why I'm seeing net/http here. I *do* make an Http request for authorization of the public key during the SSH handshake, but I'm pretty sure that's not the issue. I'm also using pprof and hosting the server on 6060 as I mentioned above. Those should be the only explicit uses of http on my end.

Any ideas? I know it's tough without sharing my code :(

Yup, it's tough.

Can you share a goroutine dump, kill your process with a SIGQUIT when you believe it is leaking.

Jesse McNelis

unread,
Aug 27, 2014, 12:55:21 AM8/27/14
to Blake Caldwell, golang-nuts
On Wed, Aug 27, 2014 at 2:29 PM, Blake Caldwell <bl...@fogcreek.com> wrote:
I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

When you say 'leak' what are you referring to?
Are you saying that the memory that the runtime reports it's using is that much less than what the OS says the process is using? a
and that the difference grows over time?

Blake Caldwell

unread,
Aug 27, 2014, 1:21:38 AM8/27/14
to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au
Okay, now that I've been able to exaggerate this "leak", it's showing up in the pprof charts. So, I'm now seeing two weird things. The first is clearly this unidentified memory usage, and why it's exaggerated when I use bigger buffers to copy between I/O streams. The second is the 4X++ discrepancy between the runtime's stats and the OS. Dave suggested that it might be OS threads - I checked: OSX said the process was using 8 threads.

Here's a recent graph, showing that the runtime *is* aware of the memory usage. It's a ton of little bits that are getting GC'd. 

This now smells like a tight loop somewhere. I just did a 30 second profile, and got this - is this a concern?

Total: 442 samples
     442 100.0% 100.0%      442 100.0% runtime.mach_semaphore_timedwait
       0   0.0% 100.0%      442 100.0% notetsleep
       0   0.0% 100.0%      442 100.0% runtime.MHeap_Scavenger
       0   0.0% 100.0%      442 100.0% runtime.gosched0
       0   0.0% 100.0%      442 100.0% runtime.notetsleepg

It's reassuring that I can chart this madness... but, at 75MB, it still only accounts for about 25% of what the system is reporting, at around 320MB. The OS says I have 8 threads, fwiw.

I still think the two goroutines I included in my previous message are fishy. 

Dave Cheney

unread,
Aug 27, 2014, 1:26:46 AM8/27/14
to Blake Caldwell, golang-nuts, Jesse McNelis
On Wed, Aug 27, 2014 at 3:21 PM, Blake Caldwell <bl...@fogcreek.com> wrote:
Okay, now that I've been able to exaggerate this "leak", it's showing up in the pprof charts. So, I'm now seeing two weird things. The first is clearly this unidentified memory usage, and why it's exaggerated when I use bigger buffers to copy between I/O streams. The second is the 4X++ discrepancy between the runtime's stats and the OS. Dave suggested that it might be OS threads - I checked: OSX said the process was using 8 threads.

Here's a recent graph, showing that the runtime *is* aware of the memory usage. It's a ton of little bits that are getting GC'd. 

This now smells like a tight loop somewhere. I just did a 30 second profile, and got this - is this a concern?

Total: 442 samples
     442 100.0% 100.0%      442 100.0% runtime.mach_semaphore_timedwait
       0   0.0% 100.0%      442 100.0% notetsleep
       0   0.0% 100.0%      442 100.0% runtime.MHeap_Scavenger
       0   0.0% 100.0%      442 100.0% runtime.gosched0
       0   0.0% 100.0%      442 100.0% runtime.notetsleepg


Profiling is busted on OSX, I don't trust it. Please profile on linux if possible.

The fact your memory is growing by 1mb per sample says 

a. a new thread is being crated per request
b. a new 1Mb buffer is being created and is retained somehow.

Can you please run your program with

env GODEBUG=gctrace=1
 

It's reassuring that I can chart this madness... but, at 75MB, it still only accounts for about 25% of what the system is reporting, at around 320MB. The OS says I have 8 threads, fwiw.


Ignore what the operating system is saying, especially on darwin. 

On linux i find the rss value is more accurate, but the most accurate value comes from the GODEBUG data above. 

I still think the two goroutines I included in my previous message are fishy. 



On Wednesday, August 27, 2014 12:55:21 AM UTC-4, Jesse McNelis wrote:
On Wed, Aug 27, 2014 at 2:29 PM, Blake Caldwell <bl...@fogcreek.com> wrote:
I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

When you say 'leak' what are you referring to?
Are you saying that the memory that the runtime reports it's using is that much less than what the OS says the process is using? a
and that the difference grows over time?

Blake Caldwell

unread,
Aug 27, 2014, 1:32:05 AM8/27/14
to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au
This is during a run with the same behavior. The GC kicks in a couple times at the bottom:

gc1(1): 1+0+188+0 us, 0 -> 0 MB, 18 (19-1) objects, 0/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc2(1): 1+0+100+0 us, 0 -> 0 MB, 339 (340-1) objects, 14/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc3(1): 0+0+111+0 us, 0 -> 0 MB, 761 (934-173) objects, 40/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc4(1): 0+0+154+0 us, 0 -> 0 MB, 1502 (1961-459) objects, 42/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc5(1): 1+0+2219+0 us, 0 -> 0 MB, 4022 (4877-855) objects, 53/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc6(1): 0+0+273+0 us, 0 -> 0 MB, 5138 (8498-3360) objects, 93/79/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc7(1): 0+0+291+0 us, 0 -> 1 MB, 5441 (11292-5851) objects, 116/102/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc8(1): 0+0+272+0 us, 0 -> 1 MB, 7154 (15708-8554) objects, 144/137/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc9(1): 0+0+335+0 us, 0 -> 1 MB, 7012 (20387-13375) objects, 148/145/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc10(1): 0+0+328+0 us, 0 -> 1 MB, 7491 (24826-17335) objects, 148/122/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc11(1): 0+0+418+0 us, 0 -> 1 MB, 8621 (30277-21656) objects, 156/154/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc12(1): 0+0+432+0 us, 0 -> 1 MB, 8163 (35081-26918) objects, 160/146/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc13(1): 0+0+455+0 us, 0 -> 1 MB, 9824 (40958-31134) objects, 164/92/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc14(1): 1+0+460+0 us, 0 -> 1 MB, 10790 (47285-36495) objects, 179/176/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc15(1): 0+0+505+0 us, 0 -> 1 MB, 12107 (54145-42038) objects, 191/187/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc16(1): 0+0+583+0 us, 0 -> 1 MB, 12634 (61293-48659) objects, 205/193/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc17(1): 0+0+866+0 us, 0 -> 1 MB, 14668 (69485-54817) objects, 221/216/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc18(1): 0+0+726+0 us, 1 -> 2 MB, 15853 (78044-62191) objects, 243/217/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc19(1): 0+0+2101+0 us, 1 -> 2 MB, 18404 (88114-69710) objects, 264/245/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc1(1): 6+0+249+0 us, 0 -> 0 MB, 18 (19-1) objects, 0/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc2(1): 0+0+132+0 us, 0 -> 0 MB, 382 (383-1) objects, 14/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc3(1): 0+0+208+0 us, 0 -> 0 MB, 447 (515-68) objects, 37/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc4(1): 1+0+318+0 us, 0 -> 0 MB, 1166 (1300-134) objects, 44/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc5(1): 1+0+227+0 us, 0 -> 0 MB, 3820 (4486-666) objects, 60/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc6(1): 1+0+236+0 us, 0 -> 0 MB, 3776 (7628-3852) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc7(1): 1+0+228+0 us, 0 -> 0 MB, 3814 (10808-6994) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc8(1): 1+0+245+0 us, 0 -> 0 MB, 3840 (14014-10174) objects, 110/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc9(1): 1+0+248+0 us, 0 -> 0 MB, 3819 (17199-13380) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc10(1): 1+0+248+0 us, 0 -> 0 MB, 3817 (20382-16565) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc11(1): 1+0+230+0 us, 0 -> 0 MB, 3778 (23526-19748) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc12(1): 1+0+260+0 us, 0 -> 0 MB, 3814 (26706-22892) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc13(1): 1+0+227+0 us, 0 -> 0 MB, 3772 (29844-26072) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc14(1): 1+0+255+0 us, 0 -> 0 MB, 3808 (33019-29211) objects, 110/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc15(1): 1+0+292+0 us, 0 -> 0 MB, 3769 (36155-32386) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc16(1): 1+0+250+0 us, 0 -> 0 MB, 3808 (39330-35522) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc17(1): 1+0+263+0 us, 0 -> 0 MB, 3769 (42466-38697) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc18(1): 1+0+250+0 us, 0 -> 0 MB, 3808 (45641-41833) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc19(1): 1+0+240+0 us, 0 -> 0 MB, 3785 (48793-45008) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc20(1): 1+0+250+0 us, 0 -> 0 MB, 3753 (51922-48169) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc21(1): 1+0+383+0 us, 0 -> 0 MB, 5799 (57088-51289) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc22(1): 1+0+319+0 us, 0 -> 0 MB, 7987 (64442-56455) objects, 110/96/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc23(1): 1+0+372+0 us, 0 -> 0 MB, 8023 (71815-63792) objects, 112/103/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc24(1): 2+1+306+1 us, 0 -> 0 MB, 8023 (79188-71165) objects, 113/104/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc25(1): 1+0+326+0 us, 0 -> 0 MB, 8023 (86561-78538) objects, 113/103/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc26(1): 1+0+366+0 us, 0 -> 0 MB, 8023 (93934-85911) objects, 113/103/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc27(1): 1+0+279+0 us, 0 -> 0 MB, 3092 (96377-93285) objects, 113/103/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc28(1): 1+0+255+0 us, 0 -> 0 MB, 2432 (98123-95691) objects, 115/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc29(1): 1+0+267+0 us, 0 -> 0 MB, 2607 (100033-97426) objects, 123/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc30(1): 1+0+271+0 us, 0 -> 0 MB, 4588 (103918-99330) objects, 124/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc31(1): 1+0+428+0 us, 0 -> 0 MB, 4470 (107680-103210) objects, 124/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc32(1): 1+0+437+0 us, 0 -> 1 MB, 8525 (115449-106924) objects, 124/113/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc33(1): 1+0+469+0 us, 0 -> 1 MB, 9649 (124310-114661) objects, 126/117/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc34(1): 1+0+472+0 us, 0 -> 1 MB, 9660 (133178-123518) objects, 135/126/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc35(1): 1+0+438+0 us, 0 -> 1 MB, 9216 (141607-132391) objects, 135/126/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc36(1): 1+0+433+0 us, 0 -> 1 MB, 9849 (150651-140802) objects, 135/125/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc37(1): 1+1+297+0 us, 0 -> 1 MB, 9860 (159702-149842) objects, 140/131/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc38(1): 1+0+300+0 us, 0 -> 1 MB, 8342 (167235-158893) objects, 140/131/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc39(1): 2+0+502+0 us, 0 -> 10 MB, 2297 (168677-166380) objects, 140/23/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc40(1): 1+0+349+0 us, 10 -> 30 MB, 915 (168707-167792) objects, 140/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
scvg0: inuse: 44, idle: 0, sys: 45, released: 0, consumed: 45 (MB)
gc41(1): 1+0+473+0 us, 30 -> 61 MB, 490591 (658410-167819) objects, 140/79/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
scvg1: inuse: 32, idle: 29, sys: 62, released: 0, consumed: 62 (MB)
scvg2: inuse: 51, idle: 10, sys: 62, released: 0, consumed: 62 (MB)
gc42(1): 0+0+332+0 us, 30 -> 61 MB, 495738 (1151522-655784) objects, 4009/3998/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
scvg3: inuse: 40, idle: 23, sys: 63, released: 0, consumed: 63 (MB)

Dave Cheney

unread,
Aug 27, 2014, 1:39:59 AM8/27/14
to Blake Caldwell, golang-nuts, Jesse McNelis
Looking at this your program has asked for 63mb from the operating system, 40 is in use, 23 is idle and _may_ be released to the operating system via madvise(2) iff the page is completely free. The operating system is free to ignore this advice and frequently does.

The other 400 meg reported, unless it comes from pthread stacks, i'd say your operating system has lost the ability to count.

Can you try something like pmap on a leaking process, I find that a pretty simple way of locating and counting the number of pthread stacks active in a process.

Dmitry Vyukov

unread,
Aug 27, 2014, 4:13:59 AM8/27/14
to Blake Caldwell, golang-nuts, Jesse McNelis
On Wed, Aug 27, 2014 at 9:21 AM, Blake Caldwell <bl...@fogcreek.com> wrote:
Okay, now that I've been able to exaggerate this "leak", it's showing up in the pprof charts. So, I'm now seeing two weird things. The first is clearly this unidentified memory usage, and why it's exaggerated when I use bigger buffers to copy between I/O streams. The second is the 4X++ discrepancy between the runtime's stats and the OS. Dave suggested that it might be OS threads - I checked: OSX said the process was using 8 threads.

Here's a recent graph, showing that the runtime *is* aware of the memory usage. It's a ton of little bits that are getting GC'd. 

This now smells like a tight loop somewhere. I just did a 30 second profile, and got this - is this a concern?

Total: 442 samples
     442 100.0% 100.0%      442 100.0% runtime.mach_semaphore_timedwait
       0   0.0% 100.0%      442 100.0% notetsleep
       0   0.0% 100.0%      442 100.0% runtime.MHeap_Scavenger
       0   0.0% 100.0%      442 100.0% runtime.gosched0
       0   0.0% 100.0%      442 100.0% runtime.notetsleepg

It's reassuring that I can chart this madness... but, at 75MB, it still only accounts for about 25% of what the system is reporting, at around 320MB. The OS says I have 8 threads, fwiw.


Are you looking at resident or virtual memory consumption?

 

I still think the two goroutines I included in my previous message are fishy. 



On Wednesday, August 27, 2014 12:55:21 AM UTC-4, Jesse McNelis wrote:
On Wed, Aug 27, 2014 at 2:29 PM, Blake Caldwell <bl...@fogcreek.com> wrote:
I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

When you say 'leak' what are you referring to?
Are you saying that the memory that the runtime reports it's using is that much less than what the OS says the process is using? a
and that the difference grows over time?

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Blake Caldwell

unread,
Aug 27, 2014, 11:41:01 AM8/27/14
to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au
On the Mac, I was in Activity Monitor, looking at "Real Memory Size", watching it slowly going up while the runtime profiling was also showing it go up, but collected. 

Blake Caldwell

unread,
Aug 27, 2014, 11:49:31 AM8/27/14
to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au
I'll try out what you suggested. I think what I'm seeing via the runtime profiling is some goroutine using local variables in a loop - it might just be the Go runtime or my profiling loop. This is looking like standard garbage collection - nothing to be alarmed by. Here's an example of my service doing nothing but monitoring memory and sleeping:

It's only once I start pushing the upper limit higher that I see more of a swing in memory usage, but that's expected - GC isn't jumping in until that upper limit is reached.

The confusing part is what OSX is telling me, and you might be right that it's just dumb, or that it involves threads. When I start the app, the system reports the service using 6 threads. As I start doing more, it goes to 8. I'll start running this primarily on Linux and see what it says. 

Thanks guys - I'll post again when I see what it's doing in Linux.

Carlos Castillo

unread,
Aug 27, 2014, 11:51:33 AM8/27/14
to Blake Caldwell, golang-nuts, jes...@jessta.id.au
On the mac, if you look at the detailed info for a task (Command-I), you can see the virtual size, but it's entirely useless (130+GB) because of the way go reserves memory from the OS.


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/xsgkSALra-g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Carlos Castillo

Blake Caldwell

unread,
Aug 27, 2014, 11:52:00 AM8/27/14
to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au
Oh, I know. I wasn't talking about that one. Yep, it's worthless :)

Carlos Castillo

unread,
Aug 27, 2014, 11:52:44 AM8/27/14
to Blake Caldwell, golang-nuts, jes...@jessta.id.au
Go uses multiple threads, even on single-threaded code with GOMAXPROCS=1 (the default).
Carlos Castillo

Carlos Castillo

unread,
Aug 27, 2014, 11:55:00 AM8/27/14
to Blake Caldwell, golang-nuts, jes...@jessta.id.au
ie: a program who's sole purpose is to sleep for 30 seconds uses 3 threads at (or near) tip, more in 1.3
--
Carlos Castillo

Blake Caldwell

unread,
Aug 27, 2014, 11:57:31 AM8/27/14
to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au
Oh, sure... that's not a surprise. I just wonder if the apparent memory bloat is just because of something OSX does with threads.

Carlos Castillo

unread,
Aug 27, 2014, 12:01:24 PM8/27/14
to Blake Caldwell, golang-nuts, jes...@jessta.id.au
Real (resident size) is only increased when memory is used. A c thread that has a 32MB stack, only actually uses 32MB when it's within 4k of overflowing.

Are you absolutely sure you are not using some 3rd party package that could be allocating untracked memory (go list -f '{{.Deps}}')?
Message has been deleted

Blake Caldwell

unread,
Aug 27, 2014, 12:23:27 PM8/27/14
to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au

(heh, shrugs)

I haven't used "go list" before... I assume I append my package at the end of that command. Most of those are imported via go.crypto/ssh:

bufio
bytes
compress/flate
compress/gzip
container/list
crypto
crypto/aes
crypto/cipher
crypto/des
crypto/dsa
crypto/ecdsa
crypto/elliptic
crypto/hmac
crypto/md5
crypto/rand
crypto/rc4
crypto/rsa
crypto/sha1
crypto/sha256
crypto/sha512
crypto/subtle
crypto/tls
crypto/x509
crypto/x509/pkix
encoding
encoding/asn1
encoding/base64
encoding/binary
encoding/hex
encoding/json
encoding/pem
errors
flag
fmt
hash
hash/crc32
html
html/template
io
io/ioutil
log
math
math/big
math/rand
mime
mime/multipart
net
net/http
net/http/pprof
net/textproto
net/url
os
os/exec
os/signal
path
path/filepath
reflect
runtime
runtime/cgo
runtime/pprof
sort
strconv
strings
sync
sync/atomic
syscall
text/tabwriter
text/template
text/template/parse
time
unicode
unicode/utf16
unicode/utf8
unsafe

Carlos Castillo

unread,
Aug 27, 2014, 1:15:22 PM8/27/14
to Blake Caldwell, golang-nuts, jes...@jessta.id.au
Yes you can call go list like most go subcommands:
  • On a package
  • On the package in the current directory (no arguments)
I can't look at the fogcreek.com/* packages (sorry if I outed anything...), and the uuid package seems clean of cgo (aside from the to call net.Interfaces() @ https://code.google.com/p/go-uuid/source/browse/uuid/node.go#31) not that that is likely the problem. We've already discussed that go.crypto/ssh is clean, although again, it uses cgo through it's use of the net package. 

Re-iterating on what I said earlier "go list -json" prints all info that go list collects in JSON format. The "Deps" field shows all packages needed by the target package, while "Imports" shows the packages directly imported by it. If the problem is cgo/syscall usage it will be more likely in packages that import them directly. Also note, a package that uses cgo directly imports "C", but indirect use will show up as runtime/cgo.

Blake Caldwell

unread,
Aug 27, 2014, 11:49:42 PM8/27/14
to golan...@googlegroups.com
Pretty sure I mostly figured out what's going on. I still don't know why OSX was so different from the Go runtime's profiling, but figured out where a lot of my bloat is coming from.

I'm writing an SSH proxy, so I'm listening for new SSH connection attempts, then opening an SSH connection up with a remote machine and proxying stdout, stderr, and stdin between the two endpoints. I'm (basically) using io.Copy to copy data from one side to the other. The types of Reader and Writer are of types:

- *ssh.extChannel
- *ssh.sessionStdin
- *ssh.channel

I thought I noticed a leak, and wanted to exaggerate it to help track it down, so I started messing around with some things. In this case, it was the size of the buffer I'm using for reading and writing with. I haven't checked out all three of the Reader/Writer types, but here's what Write calls in the first I took a look at:

"packet" on line 243 is being allocated each loop, copying data from the channel to it before *then* copying it to the buffer.

Using a realistic buffer (rather than my exaggerated one for testing) should be fine, but I'm going to see if there's either a better way to proxy between these channels, or see if I can improve this SSH code.

Thanks for all the help! I'll post back my resolution when I reach it.


On Tuesday, August 26, 2014 9:02:34 AM UTC-4, Blake Caldwell wrote:
I'm having a hard time understanding why my service is using up 400MB-1GB of system memory, as reported by top on Linux, and Activity Monitor on OS X. I figured I'm leaking goroutines or my 32kb buffers, so I tried:
- reusing memory buffers (no change)
- monitoring goroutines and heap with net/http/pprof listening on 6060 - no leaked goroutines and memory was as expected, (matching my graphs below)
- exiting with panic to view the lingering goroutines - none
- runtime.MemStats graphing (see below)

I used the info from this post to graph my memory usage, and found no misbehavior:


Here's a the service after a short time:


And here it is after running all night:

This is exactly as I'd expect - about 40MB of memory being used. The sudden drop-offs are the GC doing its job - I verified this by playing with GOGC. 

However... When I use top (Linux and OSX) or Activity Monitor (OSX), the service quickly ramps up to 200MB, then slowly raised to over a GB when running over night. It looks like it then dropped down to 400MB, and started climbing again. 

This will be a long-running service, so I need to understand the discrepancy between expected and actual system memory usage. I'm clearly missing something. Any insight?

Dave Cheney

unread,
Aug 27, 2014, 11:55:24 PM8/27/14
to Blake Caldwell, golang-nuts
It is possible that we could reuse the packet buffer in that function, please lot an issue.


--

Dave Cheney

unread,
Aug 29, 2014, 3:39:38 AM8/29/14
to Blake Caldwell, golang-nuts
Hi Blake, 

Do you want to try https://codereview.appspot.com/133200044, you can apply it locally with


or setup the codereview extension as described in http://golang.org/doc/contribute.html

If you can provide some benchmark numbers for heavy use of WriteExternal that would be very useful.
Message has been deleted

Blake Caldwell

unread,
Aug 29, 2014, 4:13:15 PM8/29/14
to golan...@googlegroups.com, bl...@fogcreek.com
(edit for correctness)

Awesome, thanks Dave. I'll give this a shot.

I took a different approach with my tinkering. In the end, I decided that as long as you use a buffer around 32kb, what exists is fine, and your change will make a little better.

So, what I did... I updated WriteExtended to use the input data slice as the packets. I only allocate an array that's of size 2*headerSize (18 bytes total). To use the input data as the packet, I have to first get [headerSize] bytes ahead, so I start out by writing a tiny packet first. Each loop iteration then advances by the amount of data we wrote, always leaving an extra 9 bytes of data that I can use for the next packet's header. I store off what the original 9 bytes of data was, then put it back after writing the packet.

I could clean this code up and submit it for review - it certainly allocates very little extra memory, and it does work well in my testing. However, what I'm finding is that it seems that the packet slice is being changed somewhere downstream of c.writePacket. So, when you'd call my implementation, you'd pass in a data slice to write, and it'd be altered. I haven't seen any documentation that guarantees that the buffer passed into Write is guaranteed to be unmodified, but I think that's the expectation,so this clearly isn't okay.

I never did figure out why the system freaks out and reports way more memory than the Go runtime says. That really only happened when I'd use a giant buffer. It might be something on the Read side of things, or, maybe it's just something silly going on with OSX Mavericks.
Reply all
Reply to author
Forward
0 new messages