StackMin = 8192

640 views
Skip to first unread message

hstimer

unread,
Oct 8, 2013, 12:57:05 PM10/8/13
to golan...@googlegroups.com
For those of us who fit nicely in 4k, with 100ks of goroutines, this is a pretty big change. This nearly doubles the size of our resident memory usage.

Can we delay this change until we can make this runtime configurable? 

Dmitry Vyukov

unread,
Oct 8, 2013, 1:02:19 PM10/8/13
to Russ Cox, golang-nuts, hstimer
+rsc
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Ian Lance Taylor

unread,
Oct 8, 2013, 1:18:49 PM10/8/13
to hstimer, golang-nuts
Are you seeing this result in practice rather than in theory?

On a virtual memory system this should not double the size of resident
memory. It should only double the size of virtual memory.

Ian

Dmitry Vyukov

unread,
Oct 8, 2013, 1:27:01 PM10/8/13
to Ian Lance Taylor, hstimer, golang-nuts
This is not true. If a goroutine ever uses 4+kb of stack (e.g. during
incoming event processing), then it will use it for the whole lifetime
(e.g. when blocked on net read with minimal stack depth) (actually it
will use 8kb of RSS even after death, because the stack is not
reclaimed).

Kevin Gillette

unread,
Oct 8, 2013, 1:40:01 PM10/8/13
to golan...@googlegroups.com, Ian Lance Taylor, hstimer
On Tuesday, October 8, 2013 11:27:01 AM UTC-6, Dmitry Vyukov wrote:
If a goroutine ever uses 4+kb of stack (e.g. during incoming event processing), then it will use it for the whole lifetime

I would think the emphasis would be on goroutines that could manage to stay within 4kb throughout their lifetime.
 
(actually it will use 8kb of RSS even after death, because the stack is not reclaimed).

You make this sound like a memory leak. This is 8kb of RSS that can never be used by anything else in that process?

Dmitry Vyukov

unread,
Oct 8, 2013, 1:46:37 PM10/8/13
to Kevin Gillette, golang-nuts, Ian Lance Taylor, hstimer
It can be used for stack by another goroutine.
So if I goroutine uses 4+kb of stack, then finishes, than another
goroutine reuses the stack segment, then the second goroutine will use
8kb of RSS regardless of how much stack is uses. So it's not that
simple as "goroutines that manage to stay within 4kb throughout their
lifetime".

Chandru

unread,
Oct 8, 2013, 3:07:52 PM10/8/13
to golang-nuts
What does "stack is not reclaimed" mean? If I spawn a large number of goroutines, won't the runtime return their stacks to the OS even if they finish execution?

--
Chandra Sekar.S


Rhys Hiltner

unread,
Oct 8, 2013, 7:47:27 PM10/8/13
to golan...@googlegroups.com, Kevin Gillette, Ian Lance Taylor, hstimer
The behavior I see in my app is consistent with Dmitry's explanation. My process sits at around 600k goroutines (which churn when clients connect/disconnect), and its resident memory has increased by about 4k per goroutine since the change (with no other changes to my app).

I've been testing with "devel +094fb360be8d Tue Oct 08 16:53:56 2013 +1100" as the 8k version and "devel +7266a3768bfa Wed Oct 02 12:30:49 2013 -0400" as the 4k version.

This adds around 2.4GB of unused and unreclaimable resident memory to my process.

Alexei Sholik

unread,
Oct 9, 2013, 9:24:51 AM10/9/13
to Rhys Hiltner, golang-nuts, Kevin Gillette, Ian Lance Taylor, hstimer
I have recently performed a microbench spawning 1M goroutines. Here's the gist https://gist.github.com/alco/6900869

Points of interest:

* when observing the running program in Activity Monitor on OS X, the Real Mem reading peaks at 4.2 GB for go1.1.2 and at 4.4 GB for go tip. The value for MemStats.Sys appears to show the virtual memory usage which is twice as large with go tip.

* MemStats.Allocs is slightly below 300 MB. I have no idea about this figure – what does it show exactly?

* running runtime.GC does not seem to free any memory. I must be using it wrong.

For lulz, here's a benchmark of Erlang processes with similar intent: spawn 1M process that do an async send then block on receive. After that, collect all messages and unblock the waiting processes.

The gist https://gist.github.com/alco/6901138. The language used is Elixir.

Here, the times reported by the VM are in agreement with what I observe in the Activity Monitor: 2.8 GB peak for both real and virtual memory. After garbage collection, the reading for Real Mem gradually (as in not all at once) goes down to 40 MB. The virtual memory goes down to 150 MB.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Best regards
Alexei Sholik

andrey mirtchovski

unread,
Oct 9, 2013, 1:57:07 PM10/9/13
to Alexei Sholik, Rhys Hiltner, golang-nuts, Kevin Gillette, Ian Lance Taylor, hstimer
Alexei,

No idea if you plan to put your results in a blog or somewhere else
online, but before you do I want to mention that the timings you have
for the Go program are a result of memory thrashing. I ran your tests
unmodified on a machine with barely enough memory to hold the bigger
stack size for 1M goroutines, as well as on two other boxes -- one
running osx with sufficient memory to perform the test without
swapping and the other running linux, also with sufficient memory for
the test. The timings are dramatically different, especially on linux.
Unfortunately most online readers will bite on the difference to
erlang's in your gist.

https://gist.github.com/mirtchovski/6905198

Alexei Sholik

unread,
Oct 9, 2013, 2:24:25 PM10/9/13
to andrey mirtchovski, Rhys Hiltner, golang-nuts, Kevin Gillette, Ian Lance Taylor, hstimer
Thanks for your comment, Andrey. Those timings got in there because that was what the benchmark was written for initially. However, they don't matter for the purposes of this thread.

What I am curious about is why no memory is freed after calling runtime.GC().

Offtopic:

I rebooted my machine and did 3 runs of the Go program, taking a screenshot of the memory state before each run. Here it is https://github.com/alco/gobench. As before, it would eat up slightly more than 4 GB, so there is enough memory remaining.

It's curious that the benchmark on your Linux machine performs 10 times better than OS X. I'm guessing that at least for OS X the Elixir benchmark would show even better timings on your machine too.

James Bardin

unread,
Oct 9, 2013, 2:28:56 PM10/9/13
to golan...@googlegroups.com, andrey mirtchovski, Rhys Hiltner, Kevin Gillette, Ian Lance Taylor, hstimer


On Wednesday, October 9, 2013 2:24:25 PM UTC-4, alco wrote:

What I am curious about is why no memory is freed after calling runtime.GC().


runtime.GC() runs a garbage collection, which will free memory back to the runtime's allocation pool, not to the operating system. Last time I checked, the runtime was releasing memory back to the OS on a 5min schedule. Try letting it idle for a while and see what happens.

Rhys Hiltner

unread,
Oct 9, 2013, 2:34:15 PM10/9/13
to andrey mirtchovski, Alexei Sholik, golang-nuts, Kevin Gillette, Ian Lance Taylor, hstimer
I have a small test program that demonstrates the behavior Dmitry describes.


It begins by spawning a large number of tiny goroutines. It then kills one of the tiny goroutines, spawns a short-lived goroutine that causes its entire 8k stack to get paged in, and then spawns a replacement tiny goroutine (and repeats this process many times, such that a large number of 8k stacks are entirely paged in).

This is modeled after the behavior I see in my (real) application: I have a large number of goroutines with very small stacks, but there's a small number of goroutines that use larger stacks. When the goroutines churn, eventually every stack frame gets fully paged in.

My test results are below. With 8k stacks, the virtual memory is 845552 bytes, but the resident memory is either 464876 or 839748 bytes depending on whether the short-lived goroutine forces the entire stack to get paged in.


# StackMin = 8192
$ ./go1.2.094fb360be8d/bin/go version
go version devel +094fb360be8d Tue Oct 08 16:53:56 2013 +1100 linux/amd64
$ ps aux | grep stackmin
rhys     11065  1.3  0.0 257092  4912 pts/0    Sl+  11:29   0:00 ./go1.2.094fb360be8d/bin/go run /home/users/rhys/stackmin.go --poison=false
rhys     11074 25.5  1.8 845552 464876 pts/0   Sl+  11:29   0:00 /tmp/go-build970608091/command-line-arguments/_obj/exe/stackmin --poison=false
$ ps aux | grep stackmin
rhys     11080  2.0  0.0 257092  4844 pts/0    Sl+  11:29   0:00 ./go1.2.094fb360be8d/bin/go run /home/users/rhys/stackmin.go --poison=true
rhys     11089 38.5  3.4 845552 839748 pts/0   Sl+  11:29   0:00 /tmp/go-build623374596/command-line-arguments/_obj/exe/stackmin --poison=true

# StackMin = 4096
$ ./go1.2.7266a3768bfa/bin/go version
go version devel +7266a3768bfa Wed Oct 02 12:30:49 2013 -0400 linux/amd64
$ ps aux | grep stackmin
rhys     11030  0.2  0.0 256960  4820 pts/0    Sl+  11:28   0:00 ./go1.2.7266a3768bfa/bin/go run /home/users/rhys/stackmin.go --poison=false
rhys     11039  2.7  1.7 445360 439876 pts/0   Sl+  11:28   0:00 /tmp/go-build811739671/command-line-arguments/_obj/exe/stackmin --poison=false
$ ps aux | grep stackmin
rhys     11050  1.3  0.0 256960  4940 pts/0    Sl+  11:29   0:00 ./go1.2.7266a3768bfa/bin/go run /home/users/rhys/stackmin.go --poison=true
rhys     11059 27.5  1.7 445360 439900 pts/0   Sl+  11:29   0:00 /tmp/go-build558707099/command-line-arguments/_obj/exe/stackmin --poison=true

Dmitry Vyukov

unread,
Oct 10, 2013, 7:56:14 AM10/10/13
to Chandru, golang-nuts
On Tue, Oct 8, 2013 at 11:07 PM, Chandru <chand...@gmail.com> wrote:
> What does "stack is not reclaimed" mean? If I spawn a large number of
> goroutines, won't the runtime return their stacks to the OS even if they
> finish execution?

No, it won't. The stacks can be reused for new goroutines, though.

Chandru

unread,
Oct 10, 2013, 8:00:19 AM10/10/13
to Dmitry Vyukov, golang-nuts
Is this just a temporary limitation or a deliberate design decision? If deliberate, why?

--
Chandra Sekar.S

Dmitry Vyukov

unread,
Oct 10, 2013, 8:06:50 AM10/10/13
to Chandru, golang-nuts
It's a temporary limitation.
Continuous stacks (that have some chances to appear in Go1.3) will
solve this problem.

Alexei Sholik

unread,
Oct 10, 2013, 8:16:41 AM10/10/13
to James Bardin, golang-nuts, andrey mirtchovski, Rhys Hiltner, Kevin Gillette, Ian Lance Taylor, hstimer
I tried putting the program to sleep for 10 minutes after all of the goroutines had been release. The program still used 4 GB of real mem and internal reporting still showed 200 MB in use.

Used memory after creating 1000000 blocked goroutines:
Allocs: 296576472 bytes
HeapAlloc: 296576472 bytes
Sys mem: 8583142648 bytes
NumGC: 3

Remaining goroutines: 934932
Remaining goroutines after sleep: 3

### 10 minutes later ###

Used memory after freeing 1000000 blocked goroutines:
Allocs: 296576872 bytes
HeapAlloc: 296576872 bytes
Sys mem: 8584191224 bytes
NumGC: 8


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dmitry Vyukov

unread,
Oct 10, 2013, 8:19:41 AM10/10/13
to Alexei Sholik, James Bardin, golang-nuts, andrey mirtchovski, Rhys Hiltner, Kevin Gillette, Ian Lance Taylor, hstimer
Stack memory is never freed to OS.

Kevin Gillette

unread,
Oct 10, 2013, 11:35:02 AM10/10/13
to golan...@googlegroups.com, Chandru
Sure, but in defense of previous design decisions as well, this doesn't sound like an intrinsic limitation of split stacks either; likely just an implementation issue that is better worth solving with the transition to continuous stacks (which may be a misnomer, since classic C stacks are also "continuous").

tomwilde

unread,
Oct 10, 2013, 11:42:58 AM10/10/13
to golan...@googlegroups.com
How do the proposed "continuous" stacks differ from classic C stacks? I apparently missed the topic, a link would be greatly appreciated.

- Tom

Dmitry Vyukov

unread,
Oct 10, 2013, 11:46:47 AM10/10/13
to tomwilde, golang-nuts
The point is that they growable and shrinkable. One way to implement
it is segmented stacks, another way - continuous stacks.
C stacks are continuous, but not growable/shrinkable.

Russ Cox

unread,
Oct 10, 2013, 12:45:48 PM10/10/13
to Dmitry Vyukov, tomwilde, golang-nuts
Copying stacks are not a panacea. People will still run into cases that they believe are retaining too large a stack when goroutines are idle, or else where stacks are being discarded too aggressively, causing slowdowns when they reexpand. You cannot satisfy everyone at all times.

I expect that we will issue Go 1.2rc2 with the 8 kB stacks and collect more information about how well they work for people. I don't know whether the final Go 1.2 will have 4 kB or 8 kB stacks. However, if the stack size is not optimal for your specific application you can always rebuild Go with the one you want.

Russ

Reply all
Reply to author
Forward
0 new messages