Thanks Russ and Rob.
I've verified that, as you said, most of the speed comes back when I make the array local to the function or as part of a local struct field. Using a pointer to the global array didn't seem to make a difference.
Also, is there a short explanation for why a pointer to a global array may be faster than accessing the global array directly, even if we're not assigning it to anything (and thus not copying it)?
arrlen: 1024
BenchmarkArrayGlobal 2000000 860 ns/op
BenchmarkArrayGlobalPtr 2000000 872 ns/op
BenchmarkSliceGlobal 5000000 741 ns/op
BenchmarkArrayLocal 2000000 862 ns/op
BenchmarkSliceLocal 2000000 941 ns/op
BenchmarkArrayStructField 1000000 1090 ns/op
BenchmarkSliceStructField 1000000 1249 ns/op
arrlen: 2048
BenchmarkArrayGlobal 1000000 1742 ns/op
BenchmarkArrayGlobalPtr 1000000 1738 ns/op
BenchmarkSliceGlobal 1000000 1454 ns/op
BenchmarkArrayLocal 1000000 1734 ns/op
BenchmarkSliceLocal 1000000 1836 ns/op
BenchmarkArrayStructField 1000000 2160 ns/op
BenchmarkSliceStructField 1000000 2488 ns/op
Thanks.