Go 1.3 Garbage collector not releasing server memory back to system

1,780 views
Skip to first unread message

Alec Matusis

unread,
Jun 23, 2014, 8:47:09 PM6/23/14
to golan...@googlegroups.com

We wrote the simplest possible TCP server (with minor logging) to examine the memory footprint (see tcp-server.go below)

The server simply accepts connections and does nothing. It is being run on an Ubuntu 12.04.4 LTS server (kernel 3.2.0-61-generic) with Go version go1.3 linux/amd64.


The attached benchmarking program (pulse.go) creates, in this example, 10k connections, disconnects them after 30 seconds, repeats this cycle three times, and then continuously repeats small pulses of 1k connections/disconnections. The command used to test was ./pulse -big=10000 -bs=30.


The first attached graph is obtained by recording runtime.ReadMemStats when the number of clients has changed by a multiple of 500, and the second graph is the RES memory size seen by “top” for the server process.


The server starts with a negligible 1.6KB of memory. Then the memory is set by the “big” pulses of 10k connections at ~60MB (as seen by top), or at about 16MB “SystemMemory” as seen by ReadMemStats. As expected, when the 10K pulses end, the in-use memory drops, and eventually the program starts releasing memory back to OS as evidenced by the grey “Released Memory” line.


The problem is that the System Memory (and correspondingly, the RES memory seen by “top”) never drops significantly (although it drops a little as seen in the second graph).


We would expect that after the 10K pulses end, memory would continue to be released until the RES size is the minimum needed for handling each 1k pulse (which is 8m RES as seen by “top” and 2MB in-use reported  by runtime.ReadMemStats). Instead, the RES stays at about 56MB and in-use never drops from its highest value of 60MB at all.


We want to ensure scalability for irregular traffic with occasional spikes as well as be able to run multiple servers on the same box that have spikes at different times. Is there a way to effectively ensure that as much memory is released back to the system as possible in a reasonable time frame?


First graph http://i.imgur.com/PD4A0q6.png :


Second graph http://i.imgur.com/78QKW0a.png :


Code: https://gist.github.com/eugene-bulkin/e8d690b4db144f468bc5


server.go : 

package main
 
import (
"net"
"log"
"runtime"
"sync"
)
var m sync.Mutex
var num_clients = 0
var cycle = 0
 
func printMem() {
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
log.Printf("Cycle #%3d: %5d clients | System: %8d Inuse: %8d Released: %8d Objects: %6d\n", cycle, num_clients, ms.HeapSys, ms.HeapInuse, ms.HeapReleased, ms.HeapObjects)
}
 
func handleConnection(conn net.Conn) {
//log.Println("Accepted connection:", conn.RemoteAddr())
m.Lock()
num_clients++
if num_clients % 500 == 0 {
printMem()
}
m.Unlock()
buffer := make([]byte, 256)
for {
_, err := conn.Read(buffer)
if err != nil {
//log.Println("Lost connection:", conn.RemoteAddr())
err := conn.Close()
if err != nil {
log.Println("Connection close error:", err)
}
m.Lock()
num_clients--
if num_clients % 500 == 0 {
printMem()
}
if num_clients == 0 {
cycle++
}
m.Unlock()
break
}
}
}
 
func main() {
printMem()
cycle++
listener, err := net.Listen("tcp", ":3033")
if err != nil {
log.Fatal("Could not listen.")
}
for {
conn, err := listener.Accept()
if err != nil {
log.Println("Could not listen to client:", err)
continue
}
go handleConnection(conn)
}
}

pulse.go:

package main
 
import (
"flag"
"net"
"sync"
"log"
"time"
)
 
var (
numBig = flag.Int("big", 4000, "Number of connections in big pulse")
bigIters = flag.Int("i", 3, "Number of iterations of big pulse")
bigSep = flag.Int("bs", 5, "Number of seconds between big pulses")
numSmall = flag.Int("small", 1000, "Number of connections in small pulse")
smallSep = flag.Int("ss", 20, "Number of seconds between small pulses")
linger = flag.Int("l", 4, "How long connections should linger before being disconnected")
)
 
var m sync.Mutex
 
var active_conns = 0
var connections = make(map[net.Conn] bool)
 
func pulse(n int, linger int) {
var wg sync.WaitGroup
 
log.Printf("Connecting %d client(s)...\n", n)
for i := 0; i < n; i++ {
wg.Add(1)
go func() {
m.Lock()
defer m.Unlock()
defer wg.Done()
active_conns++
conn, err := net.Dial("tcp", ":3033")
if err != nil {
log.Panicln("Unable to connect: ", err)
return
}
connections[conn] = true
}()
}
wg.Wait()
if len(connections) != n {
log.Fatalf("Unable to connect all %d client(s).\n", n)
}
log.Printf("Connected %d client(s).\n", n)
time.Sleep(time.Duration(linger) * time.Second)
for conn := range connections {
active_conns--
err := conn.Close()
if err != nil {
log.Panicln("Unable to close connection:", err)
conn = nil
continue
}
delete(connections, conn)
conn = nil
}
if len(connections) > 0 {
log.Fatalf("Unable to disconnect all %d client(s) [%d remain].\n", n, len(connections))
}
log.Printf("Disconnected %d client(s).\n", n)
}
 
func main() {
flag.Parse()
for i := 0; i < *bigIters; i++ {
pulse(*numBig, *linger)
time.Sleep(time.Duration(*bigSep) * time.Second)
}
for {
pulse(*numSmall, *linger)
time.Sleep(time.Duration(*smallSep) * time.Second)
}
}




Dmitry Vyukov

unread,
Jun 24, 2014, 3:19:23 AM6/24/14
to Alec Matusis, golang-nuts
Hi,

I don't there is a solution today.
Most of the memory seems to be occupied by goroutine stacks, and we don't release that memory to OS.
It will be somewhat better in the next release.






--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mat...@gmail.com

unread,
Jun 24, 2014, 12:49:31 PM6/24/14
to golan...@googlegroups.com, mat...@gmail.com
Hi Dmitry,

I wonder if in the future releases, you could introduce some command that the programmer can call just before the expected exit of a goroutine, indicating to the GC that this goroutine will never be reused, that would free its local stack to the OS...  
--Alec

Dmitry Vyukov

unread,
Jun 24, 2014, 1:30:20 PM6/24/14
to Alec Matusis, golang-nuts
But you do reuse goroutines for new requests. It's unclear what goroutines will use this new call.
We can just do better job in runtime to release excessive stacks. No need for new APIs in std lib.

Dmitry Vyukov

unread,
Jun 24, 2014, 1:32:09 PM6/24/14
to Alec Matusis, golang-nuts
Please file an issue at http://golang.org/issue/new
I believe that this can be reduced to just creating lots of goroutines, waiting for them to terminate, calling runtime/debug.FreeOSMemory several times, and RSS still stays high.

mat...@gmail.com

unread,
Jun 25, 2014, 2:46:04 AM6/25/14
to golan...@googlegroups.com, mat...@gmail.com

Vincent Callanan

unread,
Jun 25, 2014, 5:02:23 AM6/25/14
to golan...@googlegroups.com, mat...@gmail.com
Dimitry,

Are there serious technical hurdles to "doing a better job in run-time to release excessive stacks"?
Will we see something in 1.4?
Great strides have been made in 1.3.
However, we are clearly not there yet.
If this issue can be addressed, it will unleash Go's true potential.
Keep up the good work.

Vincent

Vincent Callanan

unread,
Jun 25, 2014, 5:59:40 AM6/25/14
to golan...@googlegroups.com, mat...@gmail.com
Sorry, I meant "Will we see [a fix] in 1.3.1?" (not 1.4)

Ian Lance Taylor

unread,
Jun 25, 2014, 4:23:24 PM6/25/14
to Vincent Callanan, mat...@gmail.com, golang-nuts


On Jun 25, 2014 2:59 AM, "Vincent Callanan" <vin...@callanan.ie> wrote:
>
> Sorry, I meant "Will we see [a fix] in 1.3.1?" (not 1.4)

No.  A change of this type will not go into a point release.

Ian

>>>>>> Second graph http://i.imgur.com/78QKW0a.png :

Colm McHugh

unread,
Jun 24, 2015, 5:48:34 PM6/24/15
to golan...@googlegroups.com, vin...@callanan.ie, mat...@gmail.com

>>>>>
>>>>> I don't there is a solution today.
>>>>> Most of the memory seems to be occupied by goroutine stacks, and we don't release that memory to OS.
>>>>> It will be somewhat better in the next release.
>>>>>

Will this be supported in a future release? If so, which one?

Thanks,
Colm. 

keith....@gmail.com

unread,
Jun 24, 2015, 10:47:15 PM6/24/15
to golan...@googlegroups.com, mat...@gmail.com, vin...@callanan.ie
We now (as of 1.4) release goroutine stack memory to the OS.  So your example should behave much better.  Please try 1.4.2 and tip if you can, and open a new bug if you're still seeing problems.  There are two known caveats which you might run into:

1) There is a kernel bug where freed pages aren't actually freed because of huge page support, see https://github.com/golang/go/issues/8832 .  There is a workaround checked in for 1.5.
2) Stacks are freed, but not G structures, see https://github.com/golang/go/issues/8832 .  We might get to this in 1.6.

Colm McHugh

unread,
Jun 25, 2015, 8:40:12 AM6/25/15
to golan...@googlegroups.com, mat...@gmail.com, keith....@gmail.com, vin...@callanan.ie


On Thursday, June 25, 2015 at 3:47:15 AM UTC+1, keith....@gmail.com wrote:
We now (as of 1.4) release goroutine stack memory to the OS.  So your example should behave much better.  Please try 1.4.2 and tip if you can, and open a new bug if you're still seeing problems.  There are two known caveats which you might run into:

1) There is a kernel bug where freed pages aren't actually freed because of huge page support, see https://github.com/golang/go/issues/8832 .  There is a workaround checked in for 1.5.
2) Stacks are freed, but not G structures, see https://github.com/golang/go/issues/8832 .  We might get to this in 1.6.
 
1) I'm seeing much better performance using 1.4.2 (vs 1.4) w.r.t process memory size reducing after a load test. Is this likely due to goroutine stack memory releasing to the OS?  
2) Is there a G structure instantiation per goroutine instance? Or is the structure shared by all goroutine instances? Does the overhead (structure size) vary per goroutine?
(I'll do some digging, appreciate any pointers)

keith....@gmail.com

unread,
Jun 25, 2015, 10:04:22 PM6/25/15
to golan...@googlegroups.com, vin...@callanan.ie, keith....@gmail.com, mat...@gmail.com


On Thursday, June 25, 2015 at 5:40:12 AM UTC-7, Colm McHugh wrote:


On Thursday, June 25, 2015 at 3:47:15 AM UTC+1, keith....@gmail.com wrote:
We now (as of 1.4) release goroutine stack memory to the OS.  So your example should behave much better.  Please try 1.4.2 and tip if you can, and open a new bug if you're still seeing problems.  There are two known caveats which you might run into:

1) There is a kernel bug where freed pages aren't actually freed because of huge page support, see https://github.com/golang/go/issues/8832 .  There is a workaround checked in for 1.5.
2) Stacks are freed, but not G structures, see https://github.com/golang/go/issues/8832 .  We might get to this in 1.6.
 
1) I'm seeing much better performance using 1.4.2 (vs 1.4) w.r.t process memory size reducing after a load test. Is this likely due to goroutine stack memory releasing to the OS?  

I don't know why 1.4.2 would be different from 1.4, they both have the same code for this feature as far as I know.
 
2) Is there a G structure instantiation per goroutine instance? Or is the structure shared by all goroutine instances? Does the overhead (structure size) vary per goroutine?
(I'll do some digging, appreciate any pointers)

To the runtime, each goroutine is basically a G structure (~300 bytes) and a stack (~2KB+).  The G structure is constant-sized.
 
Reply all
Reply to author
Forward
0 new messages