386 GC problems

61 views
Skip to first unread message

Dmitry Vyukov

unread,
Aug 30, 2011, 3:50:57 AM8/30/11
to golan...@googlegroups.com, ush...@google.com
Hi,

I've created a testing program that imports all standard packages, then allocates 2 GB in equally sized blocks, then executes GC and then checks how many of the blocks are freed (source code is below, but it also requires patching mgc0.c to not do GC until manually triggered).

On Mac/386 it says that 90%(!) of 1MB blocks are not freed (there may be some measurement error, but I think not that big).
I've investigated the problem, and it turned out that basically all blocks are pinned by various tables in data segment of the program (after adding some debug output to mgc0.c and merging it with nm output I can see what symbols cause problems).

I think most cases can be fixed by turning static arrays into dynamic slices. For example, after applying the following patch:

------------------------------------------------------------
diff -r 7fec8679f10d src/pkg/math/pow10.go
--- a/src/pkg/math/pow10.go Fri Aug 26 17:45:19 2011 -0400
+++ b/src/pkg/math/pow10.go Tue Aug 30 11:29:47 2011 +0400
@@ -4,12 +4,18 @@
 
 package math
 
+import (
+ "sync"
+)
+
 // This table might overflow 127-bit exponent representations.
 // In that case, truncate it after 1.0e38.
-var pow10tab [70]float64
+var pow10tab []float64
+var pow10tabOnce sync.Once
 
 // Pow10 returns 10**e, the base-10 exponential of e.
 func Pow10(e int) float64 {
+ pow10tabOnce.Do(pow10tabInit)
  if e <= -325 {
  return 0
  } else if e > 309 {
@@ -26,7 +32,8 @@
  return Pow10(m) * Pow10(e-m)
 }
 
-func init() {
+func pow10tabInit() {
+ pow10tab = make([]float64, 70)
  pow10tab[0] = 1.0e0
  pow10tab[1] = 1.0e1
  for i := 2; i < len(pow10tab); i++ {
------------------------------------------------------------

3 additional 1MB blocks get freed. Or when I use 16-byte blocks, 70(!) additional blocks get freed, that basically means that every float64 pins a heap object.

Before submitting any changes, I would like to know as to whether you think it (turning static arrays into dynamic slices) is the right approach or not. Potentially it also reduces exec size and/or startup times, fortunately sync.Once is quite fast now :)


------------------------------------------------------------
// the testing utility
package main

import (
"flag"
"fmt"
"os"
"runtime"
)

import (
_ "container/heap"
_ "container/list"
_ "container/ring"
_ "container/vector"
_ "flag"
_ "json"
_ "path"
_ "strings"
_ "unsafe"
_ "crypto"
_ "fmt"
_ "log"
_ "rand"
_ "sync"
_ "url"
_ "archive/tar"
_ "archive/zip"
_ "csv"
_ "go/ast"
_ "go/build"
_ "go/doc"
_ "go/parser"
_ "go/printer"
_ "go/scanner"
_ "go/token"
_ "go/typechecker"
_ "go/types"
_ "mail"
_ "reflect"
_ "syscall"
_ "utf16"
_ "asn1"
_ "debug/dwarf"
_ "debug/elf"
_ "debug/gosym"
_ "debug/macho"
_ "debug/pe"
_ "gob"
_ "math"
_ "regexp"
_ "syslog"
_ "utf8"
_ "big"
_ "hash"
_ "mime"
_ "rpc"
_ "tabwriter"
_ "websocket"
_ "bufio"
_ "ebnf"
_ "html"
_ "net"
_ "runtime"
_ "template"
_ "xml"
_ "encoding/ascii85"
_ "encoding/base32"
_ "encoding/base64"
_ "encoding/binary"
_ "encoding/git85"
_ "encoding/hex"
_ "encoding/pem"
_ "http"
_ "netchan"
_ "scanner"
_ "testing"
_ "bytes"
_ "exec"
_ "image"
_ "old/template"
_ "smtp"
_ "time"
_ "cmath"
_ "exp/datafmt"
_ "exp/gui"
_ "exp/gui/x11"
_ "exp/norm"
_ "exp/regexp/syntax"
_ "exp/template/html"
_ "index/suffixarray"
_ "os"
_ "sort"
_ "try"
_ "compress/bzip2"
_ "compress/flate"
_ "compress/gzip"
_ "compress/lzw"
_ "compress/zlib"
_ "expvar"
_ "io"
_ "patch"
_ "strconv"
_ "unicode"
)

func main() {
flagSize := flag.Int("size", 0, "alloc block size in bytes (must be a power of 2)")
flagMem := flag.Int("mem", 0, "total mem to allocate in MB")
flag.Parse()
if *flagSize <= 0 || *flagSize&(*flagSize-1) != 0 || *flagMem <= 0 {
flag.PrintDefaults()
os.Exit(1)
}
sz := uintptr(*flagSize)
cnt := *flagMem * 1024 * 1024 / (*flagSize)

a0 := runtime.MemStats.Mallocs - runtime.MemStats.Frees
for i := 0; i < cnt; i++ {
p := make([]byte, sz)
func(p []byte) {
}(p)
}
a1 := runtime.MemStats.Mallocs - runtime.MemStats.Frees - a0
runtime.GC()
a2 := runtime.MemStats.Mallocs - runtime.MemStats.Frees - a0

fmt.Printf("%.2f%% pinned (%d)\n", float64(a2)*100/float64(a1), a2)
fmt.Printf("%dMB wasted\n", a2*100*uint64(*flagMem)/a1)
}

Russ Cox

unread,
Aug 30, 2011, 7:57:36 AM8/30/11
to golan...@googlegroups.com, ush...@google.com
You don't have to use sync.Once to move things into the heap.
It would suffice to change one line:

var pow10tab [70]float64

to

var pow10tab = make([]float64, 70)

However, this is not the right approach anyway. You're letting
an implementation concern limit the way you use the language.
That's almost never the right long-term strategy. It's definitely
a problem we need to address but making people change their
code to work around it is not the way.

Russ

Dmitry Vyukov

unread,
Aug 30, 2011, 8:29:25 AM8/30/11
to r...@golang.org, golan...@googlegroups.com, ush...@google.com


I understand your point, and that's why I am asking.
However, I guess there is no simple way to fix it on GC level
(otherwise it would be already fixed). Moreover, it's not about user's
code, it's about standard library code (I perfectly understand that
users can create similar tables, but I think it's not quite common for
Go programs to create static tables of floats). Moreover, maybe it's a
better way to do it GC problem aside, data segment of the program is
250k + increased GC time for scanning of it + increased startup time
(I see ±15ms startup penalty for the program). In either case, the
issue renders Go basically useless on 386 for a lot of programs.
WDYT?

Russ Cox

unread,
Aug 30, 2011, 9:02:49 AM8/30/11
to Dmitry Vyukov, golan...@googlegroups.com, ush...@google.com
I think we need a better solution than avoiding
certain parts of the Go language.

Russ

Ian Lance Taylor

unread,
Aug 30, 2011, 9:21:17 AM8/30/11
to Dmitry Vyukov, r...@golang.org, golan...@googlegroups.com, ush...@google.com
Dmitry Vyukov <dvy...@google.com> writes:

I'm not Russ but I think this should just be fixed in the
compiler/linker. In gccgo I don't scan the entire data segment for
pointers. Instead, at program startup time all global variables which
may contain pointers are registered, and the GC scans those. This
approach is not optimal, but it does avoid this problem, since of course
a variable of type [70]float need not be registered.

Using 6l/8l we can do better. There is no implied ordering of global
variables, so the linker can order all the global variables which may
contain pointers together, and define symbols to mark the start and end
of this memory area. Then we change mark in runtime/mgc0.c to use those
symbols rather than scanning the entire data segment (minus mheap) as it
does today.

Ian

Dmitry Vyukov

unread,
Aug 30, 2011, 10:45:25 AM8/30/11
to r...@golang.org, golan...@googlegroups.com, ush...@google.com
On Tue, Aug 30, 2011 at 5:02 PM, Russ Cox <r...@golang.org> wrote:
I think we need a better solution than avoiding
certain parts of the Go language.


Nothing to object to.

Dmitry Vyukov

unread,
Aug 30, 2011, 10:53:45 AM8/30/11
to Ian Lance Taylor, r...@golang.org, golan...@googlegroups.com, ush...@google.com
If we move in the direction of preciser GC, for heap objects we can use the GC bitmap in some way (per-word bitNoPointers). But that still does not solve the problem for static data (it is not covered by the bitmap), so we need a separate solution for static data in either case. Initially I was thinking about a separate data section, but it's essentially the same as yours. Both are beyond my capabilities.

Reply all
Reply to author
Forward
0 new messages