On 25 April 2015 at 19:10, Ian Lance Taylor <
ia...@golang.org> wrote:
Hi Ian,
> On Sat, Apr 25, 2015 at 7:55 AM, Will Newton <will....@cocoon.life> wrote:
>>
>> I had a look at the profile briefly and it is spending 80+% in Go code
>> so it doesn't appear to be GC or allocator overhead, and my
>> understanding is the version of gccgo I had does escape analysis which
>> has often been cited in the past as the big missing feature in that
>> toolchain.
Thanks for the fast and informative reply!
> Chris has escape analysis to gccgo but it is not yet turned on by
> default. There is at least one bug in the code. Also, the escape
> analysis is not yet as effective as it should become.
>
> You can turn it on by passing -fgo-optimize-allocs when you compile.
>
> Also of course when using gccgo and llgo make sure you are compiling
> with optimization. It's not the default.
I wasn't aware of that. I guess it is line with how gcc operates in
general but different from how gc works by default.
> I took a look at the benchmark. The dominating element is the worker
> nested function in radix2FFT in radix2.go. I see two aspects that are
> slowing down the gccgo code. The first is that gc is doing much
> better escape analysis on the various enclosing variables referenced
> by the nested function. That reduces the number of memory loads.
>
> The second, and larger, aspect is that gc is inlining the complex
> multiplication rather than calling out to a supporting function. When
> I use -ffast-math with gccgo, I get benchmark results comparable to
> gc. Using -ffast-math tells gccgo that it doesn't need to worry about
> getting exactly correct results for infinity and NaN values when doing
> complex multiplication.
With C code I have always tended to avoid -ffast-math as it behaves in
various IEEE non-compliant ways. As far as I can tell Go makes no
specific claims about IEEE 754 compliance so I guess it is free to do
this type of optimization by default?
I tried running the code with -O3 -ffast-math -fgo-optimize-allocs for
gccgo and -O3 for llgo:
gccgo:
BenchmarkFFT 3 423200666 ns/op
testing: BenchmarkFFT left GOMAXPROCS set to 4
ok
github.com/mjibson/go-dsp/fft 2.758s
llgo:
BenchmarkFFT 10 245246700 ns/op
testing: BenchmarkFFT left GOMAXPROCS set to 4
ok
github.com/mjibson/go-dsp/fft 3.008s
gc for comparison:
BenchmarkFFT 3 402457353 ns/op
testing: BenchmarkFFT left GOMAXPROCS set to 4
ok
github.com/mjibson/go-dsp/fft 1.682s
So it looks like llgo is either doing really well here (AFAIK it
doesn't do escape analysis?) or I made some kind of mistake...