[go] runtime: faster version of findfunc

93 views
Skip to first unread message

Keith Randall (Gerrit)

unread,
Dec 27, 2014, 11:29:07 PM12/27/14
to Russ Cox, Keith Randall, golang-co...@googlegroups.com
Reviewers: Russ Cox

Keith Randall uploaded a change:
https://go-review.googlesource.com/2097

runtime: faster version of findfunc

Use a lookup table to find the function which contains a pc. It is
faster than the old binary search. findfunc is used primarily for
stack copying and garbage collection.

benchmark old ns/op new ns/op delta
BenchmarkStackCopy 294746596 255400980 -13.35%

(findfunc is one of several tasks done by stack copy, the findfunc
time itself is about 2.5x faster.)

The lookup table is built at link time. The table grows the binary
size by about 0.5% of the text segment.

We impose a lower limit of 16 bytes on any function, which should not
have much of an impact. (The real constraint required is <=256
functions in every 4096 bytes, but 16 bytes/function is easier to
implement.)

Change-Id: Ic315b7a2c83e1f7203cd2a50e5d21a822e18fdca
---
M src/cmd/ld/data.c
M src/cmd/ld/lib.h
M src/cmd/ld/pcln.c
M src/cmd/ld/pobj.c
M src/runtime/stack_test.go
M src/runtime/symtab.go
6 files changed, 113 insertions(+), 22 deletions(-)



diff --git a/src/cmd/ld/data.c b/src/cmd/ld/data.c
index 92c6fb5..bfcdcd6 100644
--- a/src/cmd/ld/data.c
+++ b/src/cmd/ld/data.c
@@ -1273,7 +1273,10 @@
sub->value += va;
if(sym->size == 0 && sym->sub != S)
ctxt->cursym = sym;
- va += sym->size;
+ if(sym->size < MINFUNC)
+ va += MINFUNC; // spacing required for findfunctab
+ else
+ va += sym->size;
}
sect->len = va - sect->vaddr;
}
diff --git a/src/cmd/ld/lib.h b/src/cmd/ld/lib.h
index 17483e0..fd84c8b 100644
--- a/src/cmd/ld/lib.h
+++ b/src/cmd/ld/lib.h
@@ -35,6 +35,7 @@

enum {
MAXIO = 8192,
+ MINFUNC = 16, // minimum size for a function
};

typedef struct Segment Segment;
@@ -260,6 +261,7 @@
int pathchar(void);
void pcln(void);
void pclntab(void);
+void findfunctab(void);
void putelfsectionsym(LSym* s, int shndx);
void putelfsymshndx(vlong sympos, int shndx);
void putsymb(LSym *s, char *name, int t, vlong v, vlong size, int ver,
LSym *typ);
diff --git a/src/cmd/ld/pcln.c b/src/cmd/ld/pcln.c
index 69671c0..40d5860 100644
--- a/src/cmd/ld/pcln.c
+++ b/src/cmd/ld/pcln.c
@@ -242,3 +242,56 @@
if(debug['v'])
Bprint(&bso, "%5.2f pclntab=%lld bytes, funcdata total %lld bytes\n",
cputime(), (vlong)ftab->size, (vlong)funcdata_bytes);
}
+
+// findfunctab generates a lookup table to quickly find the containing
+// function for a pc. See src/runtime/symtab.go:findfunc for details.
+void
+findfunctab(void)
+{
+ LSym *t, *s;
+ int32 idx, bidx, i, j, nbuckets;
+ vlong min, max;
+
+ const int bucketsize = 256*MINFUNC;
+ const int subbuckets = 16;
+
+ t = linklookup(ctxt, "runtime.findfunctab", 0);
+ t->type = SRODATA;
+ t->reachable = 1;
+
+ // find min and max address
+ min = ctxt->textp->value;
+ max = 0;
+ for(s = ctxt->textp; s != nil; s = s->next)
+ max = s->value + s->size;
+
+ // allocate table
+ nbuckets = (max-min+bucketsize-1)/bucketsize;
+ symgrow(ctxt, t, nbuckets * (4+subbuckets));
+
+ // fill in table
+ s = ctxt->textp;
+ idx = 0;
+ for(i = 0; i < nbuckets; i++) {
+ // Find first function which overlaps this bucket.
+ // Only do leaf symbols; skip symbols which are just containers (sub !=
nil but outer == nil).
+ while(s != nil && (s->value+s->size <= min + i * bucketsize || s->sub !=
nil && s->outer == nil)) {
+ s = s->next;
+ idx++;
+ }
+ // record this function in bucket header
+ setuint32(ctxt, t, i*(4+subbuckets), idx);
+ bidx = idx;
+
+ // compute subbuckets deltas
+ for(j = 0; j < subbuckets; j++) {
+ while(s != nil && (s->value+s->size <= min + i * bucketsize + j *
(bucketsize/subbuckets) || s->sub != nil && s->outer == nil)) {
+ s = s->next;
+ idx++;
+ }
+ if(idx - bidx >= 256)
+ diag("too many functions in a findfunc bucket! %d %s", idx-bidx,
s->name);
+ setuint8(ctxt, t, i*(4+subbuckets)+4+j, idx-bidx);
+ }
+ }
+}
diff --git a/src/cmd/ld/pobj.c b/src/cmd/ld/pobj.c
index b86ddfe..8ecd18b 100644
--- a/src/cmd/ld/pobj.c
+++ b/src/cmd/ld/pobj.c
@@ -187,6 +187,7 @@
gentext(); // trampolines, call stubs, etc.
textaddress();
pclntab();
+ findfunctab();
symtab();
dodata();
address();
diff --git a/src/runtime/stack_test.go b/src/runtime/stack_test.go
index 652c72e..17e3332 100644
--- a/src/runtime/stack_test.go
+++ b/src/runtime/stack_test.go
@@ -395,3 +395,21 @@
useStack(32)
panic("test panic")
}
+
+func BenchmarkStackCopy(b *testing.B) {
+ c := make(chan bool)
+ for i := 0; i < b.N; i++ {
+ go func() {
+ count(1000000)
+ c <- true
+ }()
+ <-c
+ }
+}
+
+func count(n int) int {
+ if n == 0 {
+ return 0
+ }
+ return 1 + count(n-1)
+}
diff --git a/src/runtime/symtab.go b/src/runtime/symtab.go
index 8a6ed02..78f32f7 100644
--- a/src/runtime/symtab.go
+++ b/src/runtime/symtab.go
@@ -34,12 +34,30 @@
ftab []functab
filetab []uint32

- pclntab, epclntab struct{} // linker symbols
+ pclntab, epclntab, findfunctab struct{} // linker symbols
+
+ minpc, maxpc uintptr
)

type functab struct {
entry uintptr
funcoff uintptr
+}
+
+const minfunc = 16 // minimum function size
+const pcbucketsize = 256*minfunc // size of bucket in the pc->func lookup
table
+
+// findfunctab is an array of these structures.
+// Each bucket represents 4096 bytes of the text segment.
+// Each subbucket represents 256 bytes of the text segment.
+// To find a function given a pc, locate the bucket and subbucket for
+// that pc. Add together the idx and subbucket value to obtain a
+// function index. Then scan the functab array starting at that
+// index to find the target function.
+// This table uses 20 bytes for every 4096 bytes of code, or ~0.5%
overhead.
+type findfuncbucket struct {
+ idx uint32
+ subbuckets [16]byte
}

func symtabinit() {
@@ -96,6 +114,9 @@
sp.cap = 1
sp.len = int(filetab[0])
sp.cap = sp.len
+
+ minpc = ftab[0].entry
+ maxpc = ftab[nftab].entry
}

// FuncForPC returns a *Func describing the function that contains the
@@ -126,32 +147,25 @@
}

func findfunc(pc uintptr) *_func {
- if len(ftab) == 0 {
+ if pc < minpc || pc >= maxpc {
return nil
}
+ const nsub = uintptr(len(findfuncbucket{}.subbuckets))

- if pc < ftab[0].entry || pc >= ftab[len(ftab)-1].entry {
- return nil
+ x := pc - minpc
+ b := x / pcbucketsize
+ i := x % pcbucketsize / (pcbucketsize/nsub)
+
+ ffb := (*findfuncbucket)(add(unsafe.Pointer(&findfunctab), b *
unsafe.Sizeof(findfuncbucket{})))
+ idx := ffb.idx + uint32(ffb.subbuckets[i])
+ if pc < ftab[idx].entry {
+ gothrow("bad entry")
}

- // binary search to find func with entry <= pc.
- lo := 0
- nf := len(ftab) - 1 // last entry is sentinel
- for nf > 0 {
- n := nf / 2
- f := &ftab[lo+n]
- if f.entry <= pc && pc < ftab[lo+n+1].entry {
- return (*_func)(unsafe.Pointer(&pclntable[f.funcoff]))
- } else if pc < f.entry {
- nf = n
- } else {
- lo += n + 1
- nf -= n + 1
- }
+ // linear search to find func with pc >= entry.
+ for ; pc >= ftab[idx+1].entry; idx++ {
}
-
- gothrow("findfunc: binary search failed")
- return nil
+ return (*_func)(unsafe.Pointer(&pclntable[ftab[idx].funcoff]))
}

func pcvalue(f *_func, off int32, targetpc uintptr, strict bool) int32 {

--
https://go-review.googlesource.com/2097
Gerrit-Reviewer: Russ Cox <r...@golang.org>

Russ Cox (Gerrit)

unread,
Jan 7, 2015, 3:35:56 PM1/7/15
to Keith Randall, Russ Cox, golang-co...@googlegroups.com
Russ Cox has posted comments on this change.

runtime: faster version of findfunc

Patch Set 1: Code-Review+2

(2 comments)

Very nice.

https://go-review.googlesource.com/#/c/2097/1/src/cmd/ld/pcln.c
File src/cmd/ld/pcln.c:

Line 255: const int bucketsize = 256*MINFUNC;
This only works because the compiler discards the const. Use enum for
constants.


https://go-review.googlesource.com/#/c/2097/1/src/runtime/symtab.go
File src/runtime/symtab.go:

Line 166: for ; pc >= ftab[idx+1].entry; idx++ {
slightly clearer (actual body, same form of comparison as on 161).

for ftab[idx+1].entry < pc {
idx++
Gerrit-HasComments: Yes

Keith Randall (Gerrit)

unread,
Jan 7, 2015, 4:16:09 PM1/7/15
to Keith Randall, Russ Cox, golang-co...@googlegroups.com
Reviewers: Russ Cox

Keith Randall uploaded a new patch set:
https://go-review.googlesource.com/2097

runtime: faster version of findfunc

Use a lookup table to find the function which contains a pc. It is
faster than the old binary search. findfunc is used primarily for
stack copying and garbage collection.

benchmark old ns/op new ns/op delta
BenchmarkStackCopy 294746596 255400980 -13.35%

(findfunc is one of several tasks done by stack copy, the findfunc
time itself is about 2.5x faster.)

The lookup table is built at link time. The table grows the binary
size by about 0.5% of the text segment.

We impose a lower limit of 16 bytes on any function, which should not
have much of an impact. (The real constraint required is <=256
functions in every 4096 bytes, but 16 bytes/function is easier to
implement.)

Change-Id: Ic315b7a2c83e1f7203cd2a50e5d21a822e18fdca
---
M src/cmd/ld/data.c
M src/cmd/ld/lib.h
M src/cmd/ld/pcln.c
M src/cmd/ld/pobj.c
M src/runtime/stack_test.go
M src/runtime/symtab.go
6 files changed, 116 insertions(+), 22 deletions(-)

Keith Randall (Gerrit)

unread,
Jan 7, 2015, 4:23:46 PM1/7/15
to Keith Randall, Russ Cox, golang-co...@googlegroups.com
Reviewers: Russ Cox

Keith Randall uploaded a new patch set:
https://go-review.googlesource.com/2097

runtime: faster version of findfunc

Use a lookup table to find the function which contains a pc. It is
faster than the old binary search. findfunc is used primarily for
stack copying and garbage collection.

benchmark old ns/op new ns/op delta
BenchmarkStackCopy 294746596 255400980 -13.35%

(findfunc is one of several tasks done by stack copy, the findfunc
time itself is about 2.5x faster.)

The lookup table is built at link time. The table grows the binary
size by about 0.5% of the text segment.

We impose a lower limit of 16 bytes on any function, which should not
have much of an impact. (The real constraint required is <=256
functions in every 4096 bytes, but 16 bytes/function is easier to
implement.)

Change-Id: Ic315b7a2c83e1f7203cd2a50e5d21a822e18fdca
---
M src/cmd/ld/data.c
M src/cmd/ld/lib.h
M src/cmd/ld/pcln.c
M src/cmd/ld/pobj.c
M src/runtime/symtab.go
5 files changed, 98 insertions(+), 22 deletions(-)

Keith Randall (Gerrit)

unread,
Jan 7, 2015, 4:24:24 PM1/7/15
to Keith Randall, golang-...@googlegroups.com, Russ Cox, golang-co...@googlegroups.com
Keith Randall has submitted this change and it was merged.

runtime: faster version of findfunc

Use a lookup table to find the function which contains a pc. It is
faster than the old binary search. findfunc is used primarily for
stack copying and garbage collection.

benchmark old ns/op new ns/op delta
BenchmarkStackCopy 294746596 255400980 -13.35%

(findfunc is one of several tasks done by stack copy, the findfunc
time itself is about 2.5x faster.)

The lookup table is built at link time. The table grows the binary
size by about 0.5% of the text segment.

We impose a lower limit of 16 bytes on any function, which should not
have much of an impact. (The real constraint required is <=256
functions in every 4096 bytes, but 16 bytes/function is easier to
implement.)

Change-Id: Ic315b7a2c83e1f7203cd2a50e5d21a822e18fdca
Reviewed-on: https://go-review.googlesource.com/2097
Reviewed-by: Russ Cox <r...@golang.org>
---
M src/cmd/ld/data.c
M src/cmd/ld/lib.h
M src/cmd/ld/pcln.c
M src/cmd/ld/pobj.c
M src/runtime/symtab.go
5 files changed, 98 insertions(+), 22 deletions(-)

Approvals:
Russ Cox: Looks good to me, approved


--
https://go-review.googlesource.com/2097
Gerrit-Reviewer: Keith Randall <k...@golang.org>
Gerrit-Reviewer: Russ Cox <r...@golang.org>
Reply all
Reply to author
Forward
0 new messages