Avoiding function call overhead in packages with go+asm implementations

208 views
Skip to first unread message

Caleb Spare

unread,
Nov 17, 2016, 10:42:32 PM11/17/16
to golang-nuts
I've run into an annoying problem. I'm not sure if there's a solution
I've overlooked and I'm hoping you folks have some ideas for me.

I have a library that provides a function Sum64(b []byte) uint64. I
have both asm and pure-go implementations of this function, and I want
both to be as fast as possible.

It works like this today:

x.go:
func Sum64(b []byte) uint64 { return sum64(b) }

func sum64Go(b []byte) uint64 { /* go implementation here */ }

x_noasm.go:
func sum64(b []byte) uint64 { return sum64Go(b) }

x_amd64.go:
func sum64(b []byte) uint64 // asm implementation in x_amd64.s

This allows me to have both Go and x64 implementations of my function,
and furthermore, in x_amd64_test.go, I can compare both
implementations against each other by calling sum64 (asm) and sum64Go
(Go).

Problem 1 is that every call to Sum64 incurs double function-call
overhead because of the indirection to sum64. This overhead is
significant for smallish inputs. I can work around it by getting rid
of sum64 and declaring Sum64 twice (once in x_noasm.go and once in
x_amd64.go). This is annoying because I need to maintain duplicate
documentation comments.

Problem 2 is that the pure-Go version of Sum64 incurs triple
function-call overhead: Sum64 -> sum64 -> sum64Go. The only
workarounds I can come up with are to either forgo my tests which
compare the Go+asm implementations, or else maintain two independent
copies of sum64Go (one for noasm and one for amd64). Neither of these
seem acceptable to me.

Aram gave me the idea of using //go:linkname as a hacky workaround;
this doesn't work within a single package but I suppose I could
introduce an internal package for one of the implementations. That
seems fairly awful.

One tool change that would help with problem 1 but not problem 2 is if
a Go stub could have its body implemented in Go also. That is, x.go
could declare func Sum64(b []byte) uint64 with no body and then the
bodies would be provided in x_noasm.go (in Go) and x_amd64.s (in asm).

Another idea that would solve both problems is if the compiler could
inline the "forwarding" functions to avoid the extra call. Isn't that
much simpler than the general problem of inlining non-leaf functions?
I suppose there's still the stack trace issue.

Searching for that, I did find
https://github.com/golang/go/issues/8421. Any chance this could happen
in the near-ish term?

Or any other ideas?

-Caleb

Aram Hăvărneanu

unread,
Nov 18, 2016, 8:07:03 AM11/18/16
to Caleb Spare, golang-nuts
On Fri, Nov 18, 2016 at 4:41 AM, Caleb Spare <ces...@gmail.com> wrote:
> Aram gave me the idea of using //go:linkname as a hacky workaround;
> this doesn't work within a single package

What do you mean exactly? It works here, in fact, I use this.

--
Aram Hăvărneanu

Damian Gryski

unread,
Nov 19, 2016, 12:54:00 AM11/19/16
to golang-nuts
This is one of those tricks that should be in a "best practices for writing asm with Go" document.

Maybe we should start one on the wiki? Including things like "common build tags for disabling asm" and the like.

Damian

Reply all
Reply to author
Forward
0 new messages