code review 4954043: gc: tweak and enable escape analysis (issue 4954043)

r...@golang.org

未讀,

2011年8月25日下午3:31:142011/8/25

收件者：l...@google.com、golan...@googlegroups.com、re...@codereview.appspotmail.com

Reviewers: lvd,

Message:
Hello lvd (cc: golan...@googlegroups.com),

I'd like you to review this change to
https://go.googlecode.com/hg

Description:
gc: tweak and enable escape analysis

-s now means *disable* escape analysis.

Fix escape leaks for struct/slice/map literals.
Add ... tracking.
Rewrite new(T) and slice literal into stack allocation when safe.

Add annotations to reflect.
Reflect is too chummy with the compiler,
so changes like these affect it more than they should.

Please review this at http://codereview.appspot.com/4954043/

Affected files:
M src/cmd/gc/esc.c
M src/cmd/gc/gen.c
M src/cmd/gc/go.h
M src/cmd/gc/lex.c
M src/cmd/gc/sinit.c
M src/cmd/gc/typecheck.c
M src/cmd/gc/walk.c
M src/pkg/reflect/value.go
M test/escape2.go

Russ Cox

未讀,

2011年8月25日下午3:43:182011/8/25

收件者：l...@google.com、golan...@googlegroups.com、re...@codereview.appspotmail.com

in fmt:

$ gotest
rm -f _test/fmt.a
6g -o _gotest_.6 doc.go format.go print.go scan.go
rm -f _test/fmt.a
gopack grc _test/fmt.a _gotest_.6
mallocs per Sprintf(""): 0
mallocs per Sprintf("xxx"): 1
mallocs per Sprintf("%x"): 1
mallocs per Sprintf("%x %x"): 1
PASS
$

Dave Cheney

未讀,

2011年8月26日凌晨1:18:072011/8/26

收件者：r...@golang.org、l...@google.com、golan...@googlegroups.com、re...@codereview.appspotmail.com

That it so awesome.

Sent from my iPhone

l...@google.com

未讀,

2011年8月26日凌晨2:19:292011/8/26

收件者：r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

very nice. except the s/floodgen/walkgen., but have it your way.

tested on exp/go/eval too i trust?

http://codereview.appspot.com/4954043/diff/7001/src/cmd/gc/esc.c
File src/cmd/gc/esc.c (right):

http://codereview.appspot.com/4954043/diff/7001/src/cmd/gc/esc.c#newcode266
src/cmd/gc/esc.c:266: } else {
i think these 3 can be simplified into one case: the initialisation of
esc, curfn and loopdepth can happen always, and for slices and maps you
can set dst=thesink before the loop

http://codereview.appspot.com/4954043/diff/7001/src/cmd/gc/go.h
File src/cmd/gc/go.h (left):

http://codereview.appspot.com/4954043/diff/7001/src/cmd/gc/go.h#oldcode294
src/cmd/gc/go.h:294: int escfloodgen; // increased for every flood to
detect loops
what is it with this desire to make global variables have as much uses
as possible? i argued against this before.

http://codereview.appspot.com/4954043/

l...@google.com

未讀,

2011年8月26日凌晨2:23:162011/8/26

收件者：r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

On 2011/08/26 06:19:29, lvd wrote:
> very nice. except the s/floodgen/walkgen., but have it your way.

> tested on exp/go/eval too i trust?

and if so LGTM (i always forget to say this). LGTM. lets have escape
analysis before the weekend :-)

http://codereview.appspot.com/4954043/

l...@google.com

未讀,

2011年8月26日清晨5:56:342011/8/26

收件者：r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

On 2011/08/26 06:23:16, lvd wrote:
> On 2011/08/26 06:19:29, lvd wrote:
> > very nice. except the s/floodgen/walkgen., but have it your way.
> >
> > tested on exp/go/eval too i trust?

lvd@dhcp-172-28-208-194:[~/Project/go/src/pkg/exp/eval]$ make test
gotest
rm -f _test/exp/eval.a
6g -o _gotest_.6 abort.go bridge.go compiler.go expr.go expr1.go
func.go scope.go stmt.go type.go typec.go value.go world.go
eval_test.go expr_test.go stmt_test.go
rm -f _test/exp/eval.a
gopack grc _test/exp/eval.a _gotest_.6
--- FAIL: eval.TestExpr (0.04 seconds)
exprTests[105]: Run ^u; = *eval.uintV(248) want *eval.uintV(4294967294)
exprTests[108]: Run 1+u; = *eval.uintV(248) want *eval.uintV(2)
exprTests[200]: Run u<<2; = *eval.uintV(248) want *eval.uintV(4)
exprTests[204]: Run u<<2.0; = *eval.uintV(248) want *eval.uintV(4)
exprTests[206]: Run u<<u; = *eval.uintV(248) want *eval.uintV(2)
exprTests[208]: Run u<<u; = *eval.uintV(248) want *eval.uintV(2)
unexpected fault address 0x80263498
throw: fault
[signal 0xb code=0x1 addr=0x80263498 pc=0xf84016704c]

so no.

http://codereview.appspot.com/4954043/

l...@google.com

未讀,

2011年8月26日清晨7:17:092011/8/26

收件者：r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

http://codereview.appspot.com/4954043/diff/7001/test/escape2.go
File test/escape2.go (right):

http://codereview.appspot.com/4954043/diff/7001/test/escape2.go#newcode746
test/escape2.go:746: m := &Bar{ii: x} // ERROR "&struct literal escapes
to heap"
when i run it it says it doesn't and i think thats how it should be.

http://codereview.appspot.com/4954043/

Russ Cox

未讀,

2011年8月26日上午9:16:012011/8/26

收件者：r...@golang.org、l...@google.com、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

thanks for trying it out.
i will find a copy of exp/eval and run it.
i will also look at the escape2 thing.
not sure what's going on there. i've been
running all.bash just fine.

Russ Cox

未讀,

2011年8月26日上午11:20:082011/8/26

收件者：r...@golang.org、l...@google.com、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

fixed escape2 + uploaded.
(comment was wrong).
exp/eval next

r...@golang.org

未讀,

2011年8月26日下午2:57:062011/8/26

收件者：l...@google.com、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

http://codereview.appspot.com/4954043/diff/7001/src/cmd/gc/go.h#oldcode294
> src/cmd/gc/go.h:294: int escfloodgen; // increased for every flood to
detect
> loops
> what is it with this desire to make global variables have as much uses
as
> possible? i argued against this before.

it's a general mechanism for doing a
custom walk that doesn't revisit nodes.
why make it available only to one module?
the global was already in go.h, precisely
because i had used it (or almost used it)
in the past too.

http://codereview.appspot.com/4954043/

Russ Cox

未讀,

2011年8月26日下午3:08:242011/8/26

收件者：l...@google.com、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

PTAL

I found the exp/eval bug - I was missing else blocks.
To fix that I added the obvious line but also defined
what the structure of the tree is, so there is a reference.
http://codereview.appspot.com/4954043/diff2/7001:21001/src/cmd/gc/go.h

That turned up another bug: in esccall, fn && fn->type
was being used as the condition for detecting a function
for which we have a body. That was also catching variables
of func type, so I restricted the condition a bit (and added tests).

Also, for debugging, I found it very useful to print the
things that *don't* escape, since it is easier to look for
a spurious 'does not escape' print than to look for a
missing 'does escape' print. So I added that. Those prints
and the escape prints are printed via warnl so that the
compiler can sort them by line number. For large
packages it was very confusing to get them in essentially
random order. True debugging prints (-mm) are still prints.
You can see the effect of the extra prints in escape2.go.

Thus, changes in:
http://codereview.appspot.com/4954043/diff2/7001:21001/src/cmd/gc/esc.c

* add nelse
* restrict esccall condition
* change print to warnl, add 'does not escape' prints

I tried your suggestion about merging the three
literal cases, but they are all slightly different
and it seemed less clear what was going on
in each case.

Russ

l...@google.com

未讀,

2011年8月27日上午8:23:092011/8/27

收件者：r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

happiness.
mostly. i guess we'll just not agree on the walkgen thing, but it's not
that important. the compiler is way past easy-to-understand for others
anyway so LGTM.

http://codereview.appspot.com/4954043/diff/21001/src/cmd/gc/esc.c
File src/cmd/gc/esc.c (right):

http://codereview.appspot.com/4954043/diff/21001/src/cmd/gc/esc.c#newcode89
src/cmd/gc/esc.c:89: for(l=noesc; l; l=l->next)
instead of a separate list, why not just for (l=xtop...) for
(ll=l->n->dcl... )?

http://codereview.appspot.com/4954043/diff/21001/src/cmd/gc/go.h
File src/cmd/gc/go.h (right):

http://codereview.appspot.com/4954043/diff/21001/src/cmd/gc/go.h#newcode220
src/cmd/gc/go.h:220: NodeList* ninit;
can we put this in the order the side effects should happen, i.e. the
order walk should walk?

http://codereview.appspot.com/4954043/

Gustavo Niemeyer

未讀,

2011年8月27日下午5:29:112011/8/27

收件者：r...@golang.org、l...@google.com、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

Just as an empirical data point, the same benchmark with mgo that got
a speed boost with Dmitry's work on concurrency is getting a few
seconds back with this change in place, from ~16.1s to ~18.5s. The
benchmark consists in querying 70k documents out of a MongoDB database
and unmarshalling them, and these times are seen with GOMAXPROCS=4.
With GOMAXPROCS=1, the same benchmark takes ~46.5s without escape
analysis, and ~52.6s with it enabled.

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/plus
http://niemeyer.net/twitter
http://niemeyer.net/blog

-- I never filed a patent.

Luuk van Dijk

未讀,

2011年8月27日下午5:50:352011/8/27

收件者：Gustavo Niemeyer、r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

On Sat, Aug 27, 2011 at 23:29, Gustavo Niemeyer <gus...@niemeyer.net> wrote:

Just as an empirical data point, the same benchmark with mgo that got
a speed boost with Dmitry's work on concurrency is getting a few
seconds back with this change in place, from ~16.1s to ~18.5s. The
benchmark consists in querying 70k documents out of a MongoDB database
and unmarshalling them, and these times are seen with GOMAXPROCS=4.
With GOMAXPROCS=1, the same benchmark takes ~46.5s without escape
analysis, and ~52.6s with it enabled.

that is very strange. when you say enabled, do you mean /with/ -s after the meaning flipped in this CL?

Gustavo Niemeyer

未讀,

2011年8月27日下午5:56:092011/8/27

收件者：Luuk van Dijk、r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

> that is very strange. when you say enabled, do you mean /with/ -s after the
> meaning flipped in this CL?

Enabled is this CL without changes and disabled is this CL with
debug['s']++ in gc/lex.c.

Gustavo Niemeyer

未讀,

2011年8月27日晚上7:06:582011/8/27

收件者：Luuk van Dijk、r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

> Enabled is this CL without changes and disabled is this CL with
> debug['s']++ in gc/lex.c.

Unsurprisingly, the cost seems to come from stack handling. Without
this CL, the pair old/newstack is called ~157k times, while this CL
causes the pair to be called ~25M times. May be worth tweaking the
stack size numbers a bit.

Gustavo Niemeyer

未讀,

2011年8月27日晚上7:14:372011/8/27

收件者：Luuk van Dijk、r...@golang.org、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

> Unsurprisingly, the cost seems to come from stack handling. Without
> this CL, the pair old/newstack is called ~157k times, while this CL
> causes the pair to be called ~25M times. May be worth tweaking the
> stack size numbers a bit.

Confirming: doubling StackMin and StackBig makes the old/newstack pair
be called ~11k times and puts the timing back in the ~16 seconds
range.

Russ Cox

未讀,

2011年8月27日晚上10:22:462011/8/27

收件者：Gustavo Niemeyer、Luuk van Dijk、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

Not worried about stack size for this CL.

Playing games with stack size to help one program
just hurts others. It's not a rational strategy.

Russ

Gustavo Niemeyer

未讀,

2011年8月27日晚上11:33:572011/8/27

收件者：r...@golang.org、Luuk van Dijk、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

> Not worried about stack size for this CL.
>
> Playing games with stack size to help one program
> just hurts others. It's not a rational strategy.

Clearly, but I hope you're speaking about yourself, since you are the
ones playing games with the stack size, and all I did was reporting
the outcome of your game with a real-world example submitted by a user
of mgo a while ago. Unlike the made up examples this one has suffered
with a 2 orders of magnitude penalty on the number of calls on
old/newstack, which is entirely natural given that you are increasing
the stack pressure across the board.

So yes, I guess what you say makes sense.. playing games with the
stack size may hurt other programs. Now, I wouldn't go as far as
accusing you of being irrational, since that's unnecessarily harsh.. I
would instead suggest investigating a bit more if tweaking the stack
size would avoid the penalty caused by the additional pressure without
impacting significantly other programs, before getting to conclusions.

Russ Cox

未讀,

2011年8月28日上午11:41:442011/8/28

收件者：Gustavo Niemeyer、Luuk van Dijk、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

On Sat, Aug 27, 2011 at 23:33, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
>> Playing games with stack size to help one program
>> just hurts others. It's not a rational strategy.
>
> Clearly, but I hope you're speaking about yourself, since you are the
> ones playing games with the stack size, and all I did was reporting
> the outcome of your game with a real-world example submitted by a user
> of mgo a while ago. Unlike the made up examples this one has suffered
> with a 2 orders of magnitude penalty on the number of calls on
> old/newstack, which is entirely natural given that you are increasing
> the stack pressure across the board.

It's not that simple. Your explanation makes it sound
like there is a nicer linear correlation: more things on the stack,
more stack splits, slower run time. But there are not 100x
more things on the stack, so there are not 100x more stack
splits.

What's happening here is that at certain stack depths,
any call from that depth is more expensive than a normal call,
because it starts a new stack, and the corresponding return
is also more expensive, because it has to jump back to
the original stack and free the old one. Compiler changes
that affect the size of stack frames change which function
calls in a particular program are affected by this: some
formerly expensive ones become cheap, and some
formerly cheap ones become expensive. If your program
was spending most of its time calling the expensive ones
before, it can get measurably faster. If it was spending most
of its time calling the cheap ones before, it can get measurably
slower.

Whether a program gets faster or slower is not a simple
function of 'stack pressure' except in the limiting cases
where you make the stacks so big that no call ever
needs to make a new one or so small that every call does.
In the normal cases it is just a question of whether the
stack depths that have split costs are also stack depths
that make a lot of function calls. You can change that
equality by making the stacks a bit larger but also by
making the stacks a bit smaller. We also had reports of
programs getting slower (no one ever reports when their
programs get faster) for Luuk's CL that shortened stack
frames by eliminating dead spaces. If you graphed the
number of calls made by a program as a function of
stack depth, you'd find it to be very spiky. There are hot spots.
If the hot spots line up with stack splits, then you're
in trouble. If not, you're fine.

These slowdowns are an unfortunate, unpredictable side
effect of the stack splitting, and changing the numbers to
make one program faster makes other programs slower.
There's no obvious win here, which is why I said that
tweaking stack sizes is not a rational strategy for fixing
the problem. (It is a great strategy for pinning a slowdown
on stack splitting, though.)

The right long term solution is algorithmic, so that after
a few hits at a particular boundary we can shorten or
lengthen the stack segment above it in order to make
that boundary not expensive anymore. However, being
able to do that implies being able to move stack frames
from one segment to another, which requires being able
to tell, at any function call, which words in a stack frame
are pointers and which are not, so that pointers to things
in the stack frame can be adjusted. It's a non-trivial
amount of work.

Russ

Russ Cox

未讀,

2011年8月28日上午11:48:562011/8/28

收件者：r...@golang.org、l...@google.com、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

On Sat, Aug 27, 2011 at 08:23, <l...@google.com> wrote:
> instead of a separate list, why not just for (l=xtop...) for
> (ll=l->n->dcl... )?

the noesc list contains expressions like &x and new(T),
not just declarations.

> http://codereview.appspot.com/4954043/diff/21001/src/cmd/gc/go.h#newcode220
> src/cmd/gc/go.h:220: NodeList* ninit;
> can we put this in the order the side effects should happen, i.e. the
> order walk should walk?

except for ninit first, i don't know if there is one.
walk doesn't just blindly traverse that list:
it does different things based on the kind of node.

russ

r...@golang.org

未讀,

2011年8月28日中午12:05:062011/8/28

收件者：r...@golang.org、l...@google.com、da...@cheney.net、gus...@niemeyer.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

*** Submitted as
http://code.google.com/p/go/source/detail?r=8bf8b897037f ***

gc: tweak and enable escape analysis

-s now means *disable* escape analysis.

Fix escape leaks for struct/slice/map literals.
Add ... tracking.
Rewrite new(T) and slice literal into stack allocation when safe.

Add annotations to reflect.
Reflect is too chummy with the compiler,
so changes like these affect it more than they should.

R=lvd, dave, gustavo
CC=golang-dev
http://codereview.appspot.com/4954043

http://codereview.appspot.com/4954043/

Gustavo Niemeyer

未讀,

2011年8月28日下午2:19:302011/8/28

收件者：r...@golang.org、Luuk van Dijk、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

Thanks for the explanation Russ. It's appreciated.

> It's not that simple. Your explanation makes it sound
> like there is a nicer linear correlation: more things on the stack,

(...)

> was spending most of its time calling the expensive ones
> before, it can get measurably faster. If it was spending most
> of its time calling the cheap ones before, it can get measurably
> slower.

(...)

Understood the explanation and agree. As a detail, the overall
problem description makes it sound like it's merely a matter of
positioning, and seems to ignore the fact that escape analysis is
indeed increasing the frame size in some cases, which means a
shallower depth of calls kicks the segmentation logic. This is of
course more relevant in cases such as marshalling and unmarshalling
that depend on recursing up and down more heavily. Is that a valid
statement?

> making the stacks a bit smaller. We also had reports of
> programs getting slower (no one ever reports when their
> programs get faster) for Luuk's CL that shortened stack

I did report a speed up recently, with the same example coincidently.

> The right long term solution is algorithmic, so that after
> a few hits at a particular boundary we can shorten or
> lengthen the stack segment above it in order to make
> that boundary not expensive anymore. However, being
> able to do that implies being able to move stack frames
> from one segment to another, which requires being able
> to tell, at any function call, which words in a stack frame
> are pointers and which are not, so that pointers to things
> in the stack frame can be adjusted. It's a non-trivial
> amount of work.

Sounds tricky indeed. If the above point is valid, might it be
easier/relevant to algorithmically determine the likelyhood that
certain functions will recurse and break through the segment size and
have some additional padding in those cases?

Russ Cox

未讀,

2011年8月28日晚上9:50:502011/8/28

收件者：Gustavo Niemeyer、Luuk van Dijk、da...@cheney.net、golan...@googlegroups.com、re...@codereview.appspotmail.com

On Sun, Aug 28, 2011 at 14:19, Gustavo Niemeyer <gus...@niemeyer.net> wrote:
> Understood the explanation and agree. As a detail, the overall
> problem description makes it sound like it's merely a matter of
> positioning, and seems to ignore the fact that escape analysis is
> indeed increasing the frame size in some cases, which means a
> shallower depth of calls kicks the segmentation logic. This is of
> course more relevant in cases such as marshalling and unmarshalling
> that depend on recursing up and down more heavily. Is that a valid
> statement?

Yes but only with enough caveats. For a heavily recursive
function, something like go/parser, it's zipping up and down
the stack across many segments like there's no tomorrow.
If it can only fit 0.9n frames into a segment where before it
could fit n frames, then it would see just a 1.1x slowdown
caused by stack splits.

To see the 100x that you are seeing, the most likely story is
that before you never did any stack splits, and now the
processing of leaves in your recursive function is typically
on another segment, so each leaf call is doing a stack split.
Before you might have needed 0.95 frames and now you
need 1.1 frames. Even so I am a little surprised that the
smaller load on the garbage collector does not make up
for the extra frame sizes.

If nothing else this is a good thing for profiling to record.
I created issue 2197.

Russ

回覆所有人

回覆作者

轉寄