Service using much more system memory than runtime.MemStats is showing

Blake Caldwell

unread,

Aug 26, 2014, 9:02:34 AM8/26/14

to golan...@googlegroups.com

I'm having a hard time understanding why my service is using up 400MB-1GB of system memory, as reported by top on Linux, and Activity Monitor on OS X. I figured I'm leaking goroutines or my 32kb buffers, so I tried:

- reusing memory buffers (no change)

- monitoring goroutines and heap with net/http/pprof listening on 6060 - no leaked goroutines and memory was as expected, (matching my graphs below)

- exiting with panic to view the lingering goroutines - none

- runtime.MemStats graphing (see below)

I used the info from this post to graph my memory usage, and found no misbehavior:

http://smira.ru/en/posts/aptly-memory-usage-optimization.html

Here's a the service after a short time:

And here it is after running all night:

This is exactly as I'd expect - about 40MB of memory being used. The sudden drop-offs are the GC doing its job - I verified this by playing with GOGC.

However... When I use top (Linux and OSX) or Activity Monitor (OSX), the service quickly ramps up to 200MB, then slowly raised to over a GB when running over night. It looks like it then dropped down to 400MB, and started climbing again.

This will be a long-running service, so I need to understand the discrepancy between expected and actual system memory usage. I'm clearly missing something. Any insight?

Carlos Castillo

unread,

Aug 26, 2014, 4:10:09 PM8/26/14

to golan...@googlegroups.com

It's good that your graphs are using the Sys value, since that is probably the most accurate value for the amount of memory go has received from the OS.

I'm not sure if you are aware, but the go runtime will never free memory back to the OS, neither the GC nor the Scavenger will do so in a way that is reflected in most OS statistics. This just means the memory use as viewed from outside the program will never go down on it's own for go-managed memory (go does make it easier for the OS to reclaim memory from you program by marking it unused.)

That being said, from your description it's likely that short a bug in the statistics you are using memory that is not tracked by the go runtime. If you use CGO, any memory allocated by C/C++ code will not be tracked by go memory statistics. For example, if you are using gopkg.in/qml.v1, all memory used by the UI, including for example any decoded images sent to QML will be stored in memory not tracked by go. If you use a database with a driver that is not pure-go (eg: most in-process databases like sqlite) it's memory use for caching and storing values is not tracked.

Similarly, if you use directly (or through packages) syscalls that allocate memory, they won't be tracked either. For example, under some usage conditions mmap can allocate memory that counts against the process (although AFAIK it's default operation of memory mapping files usually doesn't).

Carlos Castillo

unread,

Aug 26, 2014, 4:36:35 PM8/26/14

to golan...@googlegroups.com

Also there are two reasons why the techniques you are applying don't seem to have any effect on max memory use:

If the go statistics are correct, they can and will only affect the 40MB of memory that the go code itself is using, eg: 1/5th of the total memory use reported by the system.
Memory use never goes down, and the GC will try to keep it within a limit of 2x the greatest amount of memory actually in use at any given time.

For #2, essentially when the GC runs, it finds all the memory in use at that time, and sets a limit of 2x that to be the threshold for the next time the garbage collector will be run. This means that even if you re-use buffers, and perform other memory tricks, you generally still have the same amount of minimum memory your code needs to be able to proceed. In this case, that is likely around 20MB, and for long running apps, the greater chance a GC will be performed when all 20MB was active. At that point the memory limit is set to double that (40MB), and although the limit may drop later, your program will need to use all the memory up to that limit before the GC is called again (to reset the limit). Since memory is never returned by go (as I mentioned before), that means that all 40Mb will be acquired at some point.

Using those techniques are often still good though, the more you re-use memory without needing more the less the GC needs to run.

One last thing, there is a tool called gcvis (https://github.com/davecheney/gcvis) that allows you to see some of the go memory statistics in graph form without altering your code in any way. It's not as detailed as the tracking you've done, but is useful for a quick diagnosis of new code.

Blake Caldwell

unread,

Aug 26, 2014, 4:43:08 PM8/26/14

to golan...@googlegroups.com

Thanks so much. I hadn't considered memory allocated outside of Go. I'm writing an SSH proxy, using go.crypto/ssh. I'm guessing that's not all in Go, and that I'm using something incorrectly.

Carlos Castillo

unread,

Aug 26, 2014, 5:30:33 PM8/26/14

to Blake Caldwell, golan...@googlegroups.com

The easiest way of finding out what your code is using (to see syscall and cgo use) is to use go list.

Eg: "go list -json" will quickly show you which packages a package imports itself (shown as Imports) and transitively (Deps).

Also if the documentation for a package is visible through godoc.org then at the bottom is a link to see a graph of all imported packages for the package in question.

This way you should be able to narrow down the possible places where the C (ie: cgo) and syscall packages are used. Note that the standard library uses syscall a lot from the os package, and cgo in the net and crypto packages albeit infrequently. Although both cases shouldn't affect your memory in any unexpected ways.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/xsgkSALra-g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Carlos Castillo

Dave Cheney

unread,

Aug 26, 2014, 11:16:07 PM8/26/14

to golan...@googlegroups.com

go.crypto/ssh is 100% written in Go.

Are you setting GOMAXPROCS > 1 by any chance?

Blake Caldwell

unread,

Aug 26, 2014, 11:32:37 PM8/26/14

to golan...@googlegroups.com

Nope, using the default for GOMAXPROCS

How about net/http? Also 100% Go?

Dave Cheney

unread,

Aug 26, 2014, 11:40:37 PM8/26/14

to Blake Caldwell, golang-nuts

Yes, with the exception of dns resolution which uses cgo to call your
system name resolver.

I suspect your program has a lot of native threads running, each
consumes several megabytes (the number is highly os dependant).

As an experiment try building a version of Go with cgo disabled, this
will revert to the pure go dns resolver,

export CGO_ENABLED=0 && ./all.bash

Blake Caldwell

unread,

Aug 27, 2014, 12:29:25 AM8/27/14

to golan...@googlegroups.com, bl...@fogcreek.com

I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

So... looking at my active goroutines before and after this first SSH connection proxy, I *do* see something interesting, but I haven't the time tonight to dive into the call stacks. These two goroutines weren't there before my first SSH proxy, but they stick around after:

I'm a little confused why I'm seeing net/http here. I *do* make an Http request for authorization of the public key during the SSH handshake, but I'm pretty sure that's not the issue. I'm also using pprof and hosting the server on 6060 as I mentioned above. Those should be the only explicit uses of http on my end.

Any ideas? I know it's tough without sharing my code :(

Dave Cheney

unread,

Aug 27, 2014, 12:38:01 AM8/27/14

to Blake Caldwell, golang-nuts

On Wed, Aug 27, 2014 at 2:29 PM, Blake Caldwell <bl...@fogcreek.com> wrote:

I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

So... looking at my active goroutines before and after this first SSH connection proxy, I *do* see something interesting, but I haven't the time tonight to dive into the call stacks. These two goroutines weren't there before my first SSH proxy, but they stick around after:

I'm a little confused why I'm seeing net/http here. I *do* make an Http request for authorization of the public key during the SSH handshake, but I'm pretty sure that's not the issue. I'm also using pprof and hosting the server on 6060 as I mentioned above. Those should be the only explicit uses of http on my end.

Any ideas? I know it's tough without sharing my code :(

Yup, it's tough.

Can you share a goroutine dump, kill your process with a SIGQUIT when you believe it is leaking.

Jesse McNelis

unread,

Aug 27, 2014, 12:55:21 AM8/27/14

to Blake Caldwell, golang-nuts

On Wed, Aug 27, 2014 at 2:29 PM, Blake Caldwell <bl...@fogcreek.com> wrote:

I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

When you say 'leak' what are you referring to?

Are you saying that the memory that the runtime reports it's using is that much less than what the OS says the process is using? a

and that the difference grows over time?

Blake Caldwell

unread,

Aug 27, 2014, 1:21:38 AM8/27/14

to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au

Okay, now that I've been able to exaggerate this "leak", it's showing up in the pprof charts. So, I'm now seeing two weird things. The first is clearly this unidentified memory usage, and why it's exaggerated when I use bigger buffers to copy between I/O streams. The second is the 4X++ discrepancy between the runtime's stats and the OS. Dave suggested that it might be OS threads - I checked: OSX said the process was using 8 threads.

Here's a recent graph, showing that the runtime *is* aware of the memory usage. It's a ton of little bits that are getting GC'd.

This now smells like a tight loop somewhere. I just did a 30 second profile, and got this - is this a concern?

Total: 442 samples
442 100.0% 100.0% 442 100.0% runtime.mach_semaphore_timedwait
0 0.0% 100.0% 442 100.0% notetsleep
0 0.0% 100.0% 442 100.0% runtime.MHeap_Scavenger
0 0.0% 100.0% 442 100.0% runtime.gosched0
0 0.0% 100.0% 442 100.0% runtime.notetsleepg

It's reassuring that I can chart this madness... but, at 75MB, it still only accounts for about 25% of what the system is reporting, at around 320MB. The OS says I have 8 threads, fwiw.

I still think the two goroutines I included in my previous message are fishy.

Dave Cheney

unread,

Aug 27, 2014, 1:26:46 AM8/27/14

to Blake Caldwell, golang-nuts, Jesse McNelis

On Wed, Aug 27, 2014 at 3:21 PM, Blake Caldwell <bl...@fogcreek.com> wrote:

Okay, now that I've been able to exaggerate this "leak", it's showing up in the pprof charts. So, I'm now seeing two weird things. The first is clearly this unidentified memory usage, and why it's exaggerated when I use bigger buffers to copy between I/O streams. The second is the 4X++ discrepancy between the runtime's stats and the OS. Dave suggested that it might be OS threads - I checked: OSX said the process was using 8 threads.

Here's a recent graph, showing that the runtime *is* aware of the memory usage. It's a ton of little bits that are getting GC'd.

This now smells like a tight loop somewhere. I just did a 30 second profile, and got this - is this a concern?
Total: 442 samples
442 100.0% 100.0% 442 100.0% runtime.mach_semaphore_timedwait
0 0.0% 100.0% 442 100.0% notetsleep
0 0.0% 100.0% 442 100.0% runtime.MHeap_Scavenger
0 0.0% 100.0% 442 100.0% runtime.gosched0
0 0.0% 100.0% 442 100.0% runtime.notetsleepg

Profiling is busted on OSX, I don't trust it. Please profile on linux if possible.

The fact your memory is growing by 1mb per sample says

a. a new thread is being crated per request

b. a new 1Mb buffer is being created and is retained somehow.

Can you please run your program with

env GODEBUG=gctrace=1

It's reassuring that I can chart this madness... but, at 75MB, it still only accounts for about 25% of what the system is reporting, at around 320MB. The OS says I have 8 threads, fwiw.

Ignore what the operating system is saying, especially on darwin.

On linux i find the rss value is more accurate, but the most accurate value comes from the GODEBUG data above.

I still think the two goroutines I included in my previous message are fishy.

On Wednesday, August 27, 2014 12:55:21 AM UTC-4, Jesse McNelis wrote:

On Wed, Aug 27, 2014 at 2:29 PM, Blake Caldwell <bl...@fogcreek.com> wrote:

I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

When you say 'leak' what are you referring to?
Are you saying that the memory that the runtime reports it's using is that much less than what the OS says the process is using? a

and that the difference grows over time?

Blake Caldwell

unread,

Aug 27, 2014, 1:32:05 AM8/27/14

to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au

This is during a run with the same behavior. The GC kicks in a couple times at the bottom:

gc1(1): 1+0+188+0 us, 0 -> 0 MB, 18 (19-1) objects, 0/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc2(1): 1+0+100+0 us, 0 -> 0 MB, 339 (340-1) objects, 14/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc3(1): 0+0+111+0 us, 0 -> 0 MB, 761 (934-173) objects, 40/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc4(1): 0+0+154+0 us, 0 -> 0 MB, 1502 (1961-459) objects, 42/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc5(1): 1+0+2219+0 us, 0 -> 0 MB, 4022 (4877-855) objects, 53/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc6(1): 0+0+273+0 us, 0 -> 0 MB, 5138 (8498-3360) objects, 93/79/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc7(1): 0+0+291+0 us, 0 -> 1 MB, 5441 (11292-5851) objects, 116/102/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc8(1): 0+0+272+0 us, 0 -> 1 MB, 7154 (15708-8554) objects, 144/137/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc9(1): 0+0+335+0 us, 0 -> 1 MB, 7012 (20387-13375) objects, 148/145/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc10(1): 0+0+328+0 us, 0 -> 1 MB, 7491 (24826-17335) objects, 148/122/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc11(1): 0+0+418+0 us, 0 -> 1 MB, 8621 (30277-21656) objects, 156/154/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc12(1): 0+0+432+0 us, 0 -> 1 MB, 8163 (35081-26918) objects, 160/146/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc13(1): 0+0+455+0 us, 0 -> 1 MB, 9824 (40958-31134) objects, 164/92/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc14(1): 1+0+460+0 us, 0 -> 1 MB, 10790 (47285-36495) objects, 179/176/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc15(1): 0+0+505+0 us, 0 -> 1 MB, 12107 (54145-42038) objects, 191/187/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc16(1): 0+0+583+0 us, 0 -> 1 MB, 12634 (61293-48659) objects, 205/193/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc17(1): 0+0+866+0 us, 0 -> 1 MB, 14668 (69485-54817) objects, 221/216/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc18(1): 0+0+726+0 us, 1 -> 2 MB, 15853 (78044-62191) objects, 243/217/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc19(1): 0+0+2101+0 us, 1 -> 2 MB, 18404 (88114-69710) objects, 264/245/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc1(1): 6+0+249+0 us, 0 -> 0 MB, 18 (19-1) objects, 0/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc2(1): 0+0+132+0 us, 0 -> 0 MB, 382 (383-1) objects, 14/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc3(1): 0+0+208+0 us, 0 -> 0 MB, 447 (515-68) objects, 37/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc4(1): 1+0+318+0 us, 0 -> 0 MB, 1166 (1300-134) objects, 44/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc5(1): 1+0+227+0 us, 0 -> 0 MB, 3820 (4486-666) objects, 60/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc6(1): 1+0+236+0 us, 0 -> 0 MB, 3776 (7628-3852) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc7(1): 1+0+228+0 us, 0 -> 0 MB, 3814 (10808-6994) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc8(1): 1+0+245+0 us, 0 -> 0 MB, 3840 (14014-10174) objects, 110/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc9(1): 1+0+248+0 us, 0 -> 0 MB, 3819 (17199-13380) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc10(1): 1+0+248+0 us, 0 -> 0 MB, 3817 (20382-16565) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc11(1): 1+0+230+0 us, 0 -> 0 MB, 3778 (23526-19748) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc12(1): 1+0+260+0 us, 0 -> 0 MB, 3814 (26706-22892) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc13(1): 1+0+227+0 us, 0 -> 0 MB, 3772 (29844-26072) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc14(1): 1+0+255+0 us, 0 -> 0 MB, 3808 (33019-29211) objects, 110/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc15(1): 1+0+292+0 us, 0 -> 0 MB, 3769 (36155-32386) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc16(1): 1+0+250+0 us, 0 -> 0 MB, 3808 (39330-35522) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc17(1): 1+0+263+0 us, 0 -> 0 MB, 3769 (42466-38697) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc18(1): 1+0+250+0 us, 0 -> 0 MB, 3808 (45641-41833) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc19(1): 1+0+240+0 us, 0 -> 0 MB, 3785 (48793-45008) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc20(1): 1+0+250+0 us, 0 -> 0 MB, 3753 (51922-48169) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc21(1): 1+0+383+0 us, 0 -> 0 MB, 5799 (57088-51289) objects, 110/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc22(1): 1+0+319+0 us, 0 -> 0 MB, 7987 (64442-56455) objects, 110/96/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc23(1): 1+0+372+0 us, 0 -> 0 MB, 8023 (71815-63792) objects, 112/103/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc24(1): 2+1+306+1 us, 0 -> 0 MB, 8023 (79188-71165) objects, 113/104/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc25(1): 1+0+326+0 us, 0 -> 0 MB, 8023 (86561-78538) objects, 113/103/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc26(1): 1+0+366+0 us, 0 -> 0 MB, 8023 (93934-85911) objects, 113/103/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc27(1): 1+0+279+0 us, 0 -> 0 MB, 3092 (96377-93285) objects, 113/103/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc28(1): 1+0+255+0 us, 0 -> 0 MB, 2432 (98123-95691) objects, 115/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc29(1): 1+0+267+0 us, 0 -> 0 MB, 2607 (100033-97426) objects, 123/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc30(1): 1+0+271+0 us, 0 -> 0 MB, 4588 (103918-99330) objects, 124/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc31(1): 1+0+428+0 us, 0 -> 0 MB, 4470 (107680-103210) objects, 124/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc32(1): 1+0+437+0 us, 0 -> 1 MB, 8525 (115449-106924) objects, 124/113/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc33(1): 1+0+469+0 us, 0 -> 1 MB, 9649 (124310-114661) objects, 126/117/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc34(1): 1+0+472+0 us, 0 -> 1 MB, 9660 (133178-123518) objects, 135/126/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc35(1): 1+0+438+0 us, 0 -> 1 MB, 9216 (141607-132391) objects, 135/126/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc36(1): 1+0+433+0 us, 0 -> 1 MB, 9849 (150651-140802) objects, 135/125/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc37(1): 1+1+297+0 us, 0 -> 1 MB, 9860 (159702-149842) objects, 140/131/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc38(1): 1+0+300+0 us, 0 -> 1 MB, 8342 (167235-158893) objects, 140/131/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc39(1): 2+0+502+0 us, 0 -> 10 MB, 2297 (168677-166380) objects, 140/23/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

gc40(1): 1+0+349+0 us, 10 -> 30 MB, 915 (168707-167792) objects, 140/1/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

scvg0: inuse: 44, idle: 0, sys: 45, released: 0, consumed: 45 (MB)

gc41(1): 1+0+473+0 us, 30 -> 61 MB, 490591 (658410-167819) objects, 140/79/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

scvg1: inuse: 32, idle: 29, sys: 62, released: 0, consumed: 62 (MB)

scvg2: inuse: 51, idle: 10, sys: 62, released: 0, consumed: 62 (MB)

gc42(1): 0+0+332+0 us, 30 -> 61 MB, 495738 (1151522-655784) objects, 4009/3998/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields

scvg3: inuse: 40, idle: 23, sys: 63, released: 0, consumed: 63 (MB)

Dave Cheney

unread,

Aug 27, 2014, 1:39:59 AM8/27/14

to Blake Caldwell, golang-nuts, Jesse McNelis

Looking at this your program has asked for 63mb from the operating system, 40 is in use, 23 is idle and _may_ be released to the operating system via madvise(2) iff the page is completely free. The operating system is free to ignore this advice and frequently does.

The other 400 meg reported, unless it comes from pthread stacks, i'd say your operating system has lost the ability to count.

Can you try something like pmap on a leaking process, I find that a pretty simple way of locating and counting the number of pthread stacks active in a process.

Dmitry Vyukov

unread,

Aug 27, 2014, 4:13:59 AM8/27/14

to Blake Caldwell, golang-nuts, Jesse McNelis

On Wed, Aug 27, 2014 at 9:21 AM, Blake Caldwell <bl...@fogcreek.com> wrote:

Okay, now that I've been able to exaggerate this "leak", it's showing up in the pprof charts. So, I'm now seeing two weird things. The first is clearly this unidentified memory usage, and why it's exaggerated when I use bigger buffers to copy between I/O streams. The second is the 4X++ discrepancy between the runtime's stats and the OS. Dave suggested that it might be OS threads - I checked: OSX said the process was using 8 threads.

Here's a recent graph, showing that the runtime *is* aware of the memory usage. It's a ton of little bits that are getting GC'd.

This now smells like a tight loop somewhere. I just did a 30 second profile, and got this - is this a concern?
Total: 442 samples
442 100.0% 100.0% 442 100.0% runtime.mach_semaphore_timedwait
0 0.0% 100.0% 442 100.0% notetsleep
0 0.0% 100.0% 442 100.0% runtime.MHeap_Scavenger
0 0.0% 100.0% 442 100.0% runtime.gosched0
0 0.0% 100.0% 442 100.0% runtime.notetsleepg

It's reassuring that I can chart this madness... but, at 75MB, it still only accounts for about 25% of what the system is reporting, at around 320MB. The OS says I have 8 threads, fwiw.

Are you looking at resident or virtual memory consumption?

I still think the two goroutines I included in my previous message are fishy.

On Wednesday, August 27, 2014 12:55:21 AM UTC-4, Jesse McNelis wrote:

On Wed, Aug 27, 2014 at 2:29 PM, Blake Caldwell <bl...@fogcreek.com> wrote:

I was able to provoke my service into *really* misbehaving in a weird way. As I mentioned, this is an SSH proxy. I've been very careful to defer close everything, and I've verified these goroutines *are* closing with top-level defers that outputs to the log.

When I start the service, I'm listening for connections. There's no leak at this point. The service rests at around 15MB.

When I receive a connection, I kick off a goroutine to handle it, and in that goroutine, I kick off 3 more - one for proxying each stdout, stderr, and stdin. Again - all of these *do* close when I expect them to. After receiving just one SSH connection, dialing to another server, and successfully proxying between them, leaks start happening like crazy, even when no new connections come in. Here's the thing, though... The leaks are only obvious if I use a big []byte buffer when reading and writing between the different connections' channels. The leak is about 5MB/second when I use any sized buffer greater than 1MB. This seems to be the upper bound - if I use even bigger buffers, this doesn't leak any worse.

When you say 'leak' what are you referring to?
Are you saying that the memory that the runtime reports it's using is that much less than what the OS says the process is using? a

and that the difference grows over time?

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Blake Caldwell

unread,

Aug 27, 2014, 11:41:01 AM8/27/14

to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au

On the Mac, I was in Activity Monitor, looking at "Real Memory Size", watching it slowly going up while the runtime profiling was also showing it go up, but collected.

Blake Caldwell

unread,

Aug 27, 2014, 11:49:31 AM8/27/14

to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au

I'll try out what you suggested. I think what I'm seeing via the runtime profiling is some goroutine using local variables in a loop - it might just be the Go runtime or my profiling loop. This is looking like standard garbage collection - nothing to be alarmed by. Here's an example of my service doing nothing but monitoring memory and sleeping:

It's only once I start pushing the upper limit higher that I see more of a swing in memory usage, but that's expected - GC isn't jumping in until that upper limit is reached.

The confusing part is what OSX is telling me, and you might be right that it's just dumb, or that it involves threads. When I start the app, the system reports the service using 6 threads. As I start doing more, it goes to 8. I'll start running this primarily on Linux and see what it says.

Thanks guys - I'll post again when I see what it's doing in Linux.

Carlos Castillo

unread,

Aug 27, 2014, 11:51:33 AM8/27/14

to Blake Caldwell, golang-nuts, jes...@jessta.id.au

On the mac, if you look at the detailed info for a task (Command-I), you can see the virtual size, but it's entirely useless (130+GB) because of the way go reserves memory from the OS.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/xsgkSALra-g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Carlos Castillo

Blake Caldwell

unread,

Aug 27, 2014, 11:52:00 AM8/27/14

to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au

Oh, I know. I wasn't talking about that one. Yep, it's worthless :)

Carlos Castillo

unread,

Aug 27, 2014, 11:52:44 AM8/27/14

to Blake Caldwell, golang-nuts, jes...@jessta.id.au

Go uses multiple threads, even on single-threaded code with GOMAXPROCS=1 (the default).

Carlos Castillo

unread,

Aug 27, 2014, 11:55:00 AM8/27/14

to Blake Caldwell, golang-nuts, jes...@jessta.id.au

ie: a program who's sole purpose is to sleep for 30 seconds uses 3 threads at (or near) tip, more in 1.3

--
Carlos Castillo

Blake Caldwell

unread,

Aug 27, 2014, 11:57:31 AM8/27/14

to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au

Oh, sure... that's not a surprise. I just wonder if the apparent memory bloat is just because of something OSX does with threads.

Carlos Castillo

unread,

Aug 27, 2014, 12:01:24 PM8/27/14

to Blake Caldwell, golang-nuts, jes...@jessta.id.au

Real (resident size) is only increased when memory is used. A c thread that has a 32MB stack, only actually uses 32MB when it's within 4k of overflowing.

Are you absolutely sure you are not using some 3rd party package that could be allocating untracked memory (go list -f '{{.Deps}}')?

Message has been deleted

Blake Caldwell

unread,

Aug 27, 2014, 12:23:27 PM8/27/14

to golan...@googlegroups.com, bl...@fogcreek.com, jes...@jessta.id.au

(heh, shrugs)

I haven't used "go list" before... I assume I append my package at the end of that command. Most of those are imported via go.crypto/ssh:

bufio

bytes

code.google.com/p/go-uuid/uuid

code.google.com/p/go.crypto/ssh

compress/flate

compress/gzip

container/list

crypto

crypto/aes

crypto/cipher

crypto/des

crypto/dsa

crypto/ecdsa

crypto/elliptic

crypto/hmac

crypto/md5

crypto/rand

crypto/rc4

crypto/rsa

crypto/sha1

crypto/sha256

crypto/sha512

crypto/subtle

crypto/tls

crypto/x509

crypto/x509/pkix

encoding

encoding/asn1

encoding/base64

encoding/binary

encoding/hex

encoding/json

encoding/pem

errors

flag

fmt

hash

hash/crc32

html

html/template

io

io/ioutil

log

math

math/big

math/rand

mime

mime/multipart

net

net/http

net/http/pprof

net/textproto

net/url

os

os/exec

os/signal

path

path/filepath

reflect

runtime

runtime/cgo

runtime/pprof

sort

strconv

strings

sync

sync/atomic

syscall

text/tabwriter

text/template

text/template/parse

time

unicode

unicode/utf16

unicode/utf8

unsafe

Carlos Castillo

unread,

Aug 27, 2014, 1:15:22 PM8/27/14

to Blake Caldwell, golang-nuts, jes...@jessta.id.au

Yes you can call go list like most go subcommands:

On a package
On the package in the current directory (no arguments)

I can't look at the fogcreek.com/* packages (sorry if I outed anything...), and the uuid package seems clean of cgo (aside from the to call net.Interfaces() @ https://code.google.com/p/go-uuid/source/browse/uuid/node.go#31) not that that is likely the problem. We've already discussed that go.crypto/ssh is clean, although again, it uses cgo through it's use of the net package.

Re-iterating on what I said earlier "go list -json" prints all info that go list collects in JSON format. The "Deps" field shows all packages needed by the target package, while "Imports" shows the packages directly imported by it. If the problem is cgo/syscall usage it will be more likely in packages that import them directly. Also note, a package that uses cgo directly imports "C", but indirect use will show up as runtime/cgo.

Blake Caldwell

unread,

Aug 27, 2014, 11:49:42 PM8/27/14

to golan...@googlegroups.com

Pretty sure I mostly figured out what's going on. I still don't know why OSX was so different from the Go runtime's profiling, but figured out where a lot of my bloat is coming from.

I'm writing an SSH proxy, so I'm listening for new SSH connection attempts, then opening an SSH connection up with a remote machine and proxying stdout, stderr, and stdin between the two endpoints. I'm (basically) using io.Copy to copy data from one side to the other. The types of Reader and Writer are of types:

- *ssh.extChannel

- *ssh.sessionStdin

- *ssh.channel

I thought I noticed a leak, and wanted to exaggerate it to help track it down, so I started messing around with some things. In this case, it was the size of the buffer I'm using for reading and writing with. I haven't checked out all three of the Reader/Writer types, but here's what Write calls in the first I took a look at:

https://code.google.com/p/go/source/browse/ssh/channel.go?repo=crypto#224

"packet" on line 243 is being allocated each loop, copying data from the channel to it before *then* copying it to the buffer.

Using a realistic buffer (rather than my exaggerated one for testing) should be fine, but I'm going to see if there's either a better way to proxy between these channels, or see if I can improve this SSH code.

Thanks for all the help! I'll post back my resolution when I reach it.

On Tuesday, August 26, 2014 9:02:34 AM UTC-4, Blake Caldwell wrote:

I'm having a hard time understanding why my service is using up 400MB-1GB of system memory, as reported by top on Linux, and Activity Monitor on OS X. I figured I'm leaking goroutines or my 32kb buffers, so I tried:
- reusing memory buffers (no change)
- monitoring goroutines and heap with net/http/pprof listening on 6060 - no leaked goroutines and memory was as expected, (matching my graphs below)
- exiting with panic to view the lingering goroutines - none
- runtime.MemStats graphing (see below)

I used the info from this post to graph my memory usage, and found no misbehavior:

http://smira.ru/en/posts/aptly-memory-usage-optimization.html

Here's a the service after a short time:

And here it is after running all night:
This is exactly as I'd expect - about 40MB of memory being used. The sudden drop-offs are the GC doing its job - I verified this by playing with GOGC.
However... When I use top (Linux and OSX) or Activity Monitor (OSX), the service quickly ramps up to 200MB, then slowly raised to over a GB when running over night. It looks like it then dropped down to 400MB, and started climbing again.
This will be a long-running service, so I need to understand the discrepancy between expected and actual system memory usage. I'm clearly missing something. Any insight?

Dave Cheney

unread,

Aug 27, 2014, 11:55:24 PM8/27/14

to Blake Caldwell, golang-nuts

It is possible that we could reuse the packet buffer in that function, please lot an issue.

--

Dave Cheney

unread,

Aug 29, 2014, 3:39:38 AM8/29/14

to Blake Caldwell, golang-nuts

Hi Blake,

Do you want to try https://codereview.appspot.com/133200044, you can apply it locally with

cd $GOPATH/src/code.google.com/p/go.crypt/ssh

hg import --force --no-commit https://codereview.appspot.com/download/issue133200044_20001.diff

or setup the codereview extension as described in http://golang.org/doc/contribute.html

If you can provide some benchmark numbers for heavy use of WriteExternal that would be very useful.

Message has been deleted

Blake Caldwell

unread,

Aug 29, 2014, 4:13:15 PM8/29/14

to golan...@googlegroups.com, bl...@fogcreek.com

(edit for correctness)

Awesome, thanks Dave. I'll give this a shot.

I took a different approach with my tinkering. In the end, I decided that as long as you use a buffer around 32kb, what exists is fine, and your change will make a little better.

So, what I did... I updated WriteExtended to use the input data slice as the packets. I only allocate an array that's of size 2*headerSize (18 bytes total). To use the input data as the packet, I have to first get [headerSize] bytes ahead, so I start out by writing a tiny packet first. Each loop iteration then advances by the amount of data we wrote, always leaving an extra 9 bytes of data that I can use for the next packet's header. I store off what the original 9 bytes of data was, then put it back after writing the packet.

I could clean this code up and submit it for review - it certainly allocates very little extra memory, and it does work well in my testing. However, what I'm finding is that it seems that the packet slice is being changed somewhere downstream of c.writePacket. So, when you'd call my implementation, you'd pass in a data slice to write, and it'd be altered. I haven't seen any documentation that guarantees that the buffer passed into Write is guaranteed to be unmodified, but I think that's the expectation,so this clearly isn't okay.

I never did figure out why the system freaks out and reports way more memory than the Go runtime says. That really only happened when I'd use a giant buffer. It might be something on the Read side of things, or, maybe it's just something silly going on with OSX Mavericks.

Reply all

Reply to author

Forward