Thank you, Go team, for all your work on this! I (and I think many others!) love all these behind-the-scenes changes that make our lives better.
I went to look for that 5% performance boost in my GoAWK interpreter (due to the new register-based calling convention), and found a 38% improvement instead!
$ time goawk_go1.16 'BEGIN { for (i=0; i<100000000; i++) s += i; print(s) }'
4999999950000000
real 0m10.158s ...
$ time goawk_go1.17 'BEGIN { for (i=0; i<100000000; i++) s += i; print(s) }'
4999999950000000
real 0m6.268s ...
After some pointers from people (see
https://news.ycombinator.com/item?id=28203608) and some digging into the assembly, I found that many of the function-call heavy methods in interp/interp.go do a lot less stack manipulation (as expected) but are also 60-70% of the size of the versions compiled with Go 1.16. The overall binary is not 60-70% smaller, of course (it's "only" 7% smaller), but the fact that these hot-loop interpreter functions are so much smaller is a good indication they're doing a lot less -- as confirmed by the perf numbers.
I know that some people haven't gotten as much as 5% improvement, and a few things actually slowed down, but I just wanted to share a success story.
-Ben