On Tue, Apr 25, 2023 at 10:57 AM Sam Vilain <
s...@vilain.net> wrote:
>
> (a) where can I find how this specific optimization is defined?
It's in the compiler. It's not especially easy to pull it out. In
this specific case, it's something like
cmd/compile/internal/ssagen/ssa.go:
addF("math/bits", "TrailingZeros64",
func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue1(ssa.OpCtz64, types.Types[types.TINT], args[0])
},
sys.AMD64, sys.I386, sys.ARM64, sys.ARM, sys.S390X, sys.MIPS,
sys.PPC64, sys.Wasm)
cmd/compile/internal/ssa/_gen/AMD64.rules:
(Ctz64 x) && buildcfg.GOAMD64 >= 3 => (TZCNTQ x)
cmd/compile/internal/ssa/_gen/AMD64Ops.go:
// count the number of trailing zero bits, prefer TZCNTQ over BSFQ, as
TZCNTQ(0)==64
// and BSFQ(0) is undefined. Same for TZCNTL(0)==32
{name: "TZCNTQ", argLength: 1, reg: gp11, asm: "TZCNTQ", clobberFlags: true},
> (b) is it possible to write assembly functions that avoid the wrapper code, assuming that one follows the platform's calling convention?
Go uses its own calling convention. There is an internal
register-based ABI, but I don't think it's stable. See
https://go.googlesource.com/proposal/+/refs/heads/master/design/40724-register-calling.md.
Ian