gccgo produces much slower code than gc

801 zobrazení
Přeskočit na první nepřečtenou zprávu

Tamir Duberstein

nepřečteno,
14. 2. 2018 16:20:1114.02.18
komu: golan...@googlegroups.com
Running the benchmarks in github.com/cockroachdb/cockroach/pkg/roachpb:

PATH=$HOME/local/go1.10/bin:$PATH
go test -i ./pkg/roachpb && go test -run - -bench . ./pkg/roachpb -count 5 -benchmem > gc.txt

PATH=$GCCROOT/gcc/bin:$PATH
go test -i ./pkg/roachpb && go test -run - -bench . ./pkg/roachpb -count 5 -benchmem > gccgo.txt

benchstat gc.txt gccgo.txt
name                old time/op    new time/op    delta
ValueSetBytes-16      38.7ns ± 1%   188.8ns ±44%  +388.36%  (p=0.008 n=5+5)
ValueSetFloat-16      27.6ns ± 1%   112.4ns ± 4%  +306.95%  (p=0.008 n=5+5)
ValueSetBool-16       29.5ns ± 0%    69.5ns ± 7%  +135.59%  (p=0.008 n=5+5)
ValueSetInt-16        35.9ns ± 1%    89.0ns ± 5%  +147.83%  (p=0.008 n=5+5)
ValueSetProto-16      45.5ns ± 0%   127.4ns ± 0%  +180.00%  (p=0.008 n=5+5)
ValueSetTime-16       52.4ns ± 1%   136.4ns ± 0%  +160.40%  (p=0.008 n=5+5)
ValueSetDecimal-16    95.5ns ± 1%   255.0ns ± 0%  +166.96%  (p=0.008 n=5+5)
ValueSetTuple-16      38.7ns ± 1%   116.0ns ± 0%  +200.05%  (p=0.016 n=5+4)
ValueGetBytes-16      9.22ns ± 0%   31.60ns ± 0%  +242.66%  (p=0.008 n=5+5)
ValueGetFloat-16      12.0ns ± 0%    49.9ns ± 0%  +315.83%  (p=0.016 n=4+5)
ValueGetBool-16       14.4ns ± 0%    39.3ns ± 0%  +172.92%  (p=0.029 n=4+4)
ValueGetInt-16        13.7ns ± 0%    37.3ns ± 0%  +172.12%  (p=0.016 n=4+5)
ValueGetProto-16      26.1ns ± 0%    60.5ns ± 0%  +131.72%  (p=0.016 n=4+5)
ValueGetTime-16       39.6ns ± 0%   172.0ns ± 0%  +334.34%  (p=0.008 n=5+5)
ValueGetDecimal-16    95.1ns ± 0%   264.0ns ± 0%  +177.49%  (p=0.008 n=5+5)
ValueGetTuple-16      9.84ns ± 0%   31.50ns ± 0%  +220.25%  (p=0.008 n=5+5)
name                old alloc/op   new alloc/op   delta
ValueSetBytes-16       32.0B ± 0%     32.0B ± 0%      ~     (all equal)
ValueSetFloat-16       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
ValueSetBool-16        8.00B ± 0%     8.00B ± 0%      ~     (all equal)
ValueSetInt-16         16.0B ± 0%     16.0B ± 0%      ~     (all equal)
ValueSetProto-16       8.00B ± 0%     8.00B ± 0%      ~     (all equal)
ValueSetTime-16        16.0B ± 0%     16.0B ± 0%      ~     (all equal)
ValueSetDecimal-16     32.0B ± 0%     32.0B ± 0%      ~     (all equal)
ValueSetTuple-16       32.0B ± 0%     32.0B ± 0%      ~     (all equal)
ValueGetBytes-16       0.00B          0.00B           ~     (all equal)
ValueGetFloat-16       0.00B          0.00B           ~     (all equal)
ValueGetBool-16        0.00B          0.00B           ~     (all equal)
ValueGetInt-16         0.00B          0.00B           ~     (all equal)
ValueGetProto-16       0.00B          0.00B           ~     (all equal)
ValueGetTime-16        0.00B          0.00B           ~     (all equal)
ValueGetDecimal-16     48.0B ± 0%     48.0B ± 0%      ~     (all equal)
ValueGetTuple-16       0.00B          0.00B           ~     (all equal)
name                old allocs/op  new allocs/op  delta
ValueSetBytes-16        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueSetFloat-16        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueSetBool-16         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueSetInt-16          1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueSetProto-16        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueSetTime-16         1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueSetDecimal-16      1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueSetTuple-16        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueGetBytes-16        0.00           0.00           ~     (all equal)
ValueGetFloat-16        0.00           0.00           ~     (all equal)
ValueGetBool-16         0.00           0.00           ~     (all equal)
ValueGetInt-16          0.00           0.00           ~     (all equal)
ValueGetProto-16        0.00           0.00           ~     (all equal)
ValueGetTime-16         0.00           0.00           ~     (all equal)
ValueGetDecimal-16      1.00 ± 0%      1.00 ± 0%      ~     (all equal)
ValueGetTuple-16        0.00           0.00           ~     (all equal)

I chose this package because it doesn't depend on any of the fancy Makefile magic in the CockroachDB repo; you should be able to reproduce these results using just the go tool.

Are these results expected? I did minimal digging using pprof and perf but nothing obvious jumps out - things are just slower across the board. These results are on linux amd64.

Ian Lance Taylor

nepřečteno,
14. 2. 2018 19:44:5514.02.18
komu: Tamir Duberstein, golang-nuts
Which version of gccgo?

Ian

Tamir Duberstein

nepřečteno,
15. 2. 2018 9:43:3315.02.18
komu: Ian Lance Taylor, golang-nuts

Ian Lance Taylor

nepřečteno,
15. 2. 2018 13:59:3215.02.18
komu: Tamir Duberstein, golang-nuts
On Thu, Feb 15, 2018 at 6:42 AM, Tamir Duberstein <tam...@gmail.com> wrote:
> Built at this revision:
> https://github.com/gcc-mirror/gcc/commit/a82f431e184a9ac922ad43df73cdcc702ab0f279

Thanks. What do you see from

go test -gccgoflags="-g -O2"

?

Ian

Ian Lance Taylor

nepřečteno,
15. 2. 2018 14:00:3415.02.18
komu: Tamir Duberstein, golang-nuts
On Thu, Feb 15, 2018 at 10:59 AM, Ian Lance Taylor <ia...@golang.org> wrote:
> On Thu, Feb 15, 2018 at 6:42 AM, Tamir Duberstein <tam...@gmail.com> wrote:
>> Built at this revision:
>> https://github.com/gcc-mirror/gcc/commit/a82f431e184a9ac922ad43df73cdcc702ab0f279
>
> Thanks. What do you see from
>
> go test -gccgoflags="-g -O2"
>
> ?

Sorry, make that

go test -gccgoflags=all="-g -O2"

Tamir Duberstein

nepřečteno,
15. 2. 2018 14:32:4315.02.18
komu: Ian Lance Taylor, golang-nuts
What does all do? Anyway, the results are better, but still not "good":

name                old time/op    new time/op    delta
ValueSetBytes-16      38.5ns ± 0%   105.8ns ± 4%  +174.81%  (p=0.008 n=5+5)
ValueSetFloat-16      27.5ns ± 1%    73.2ns ± 1%  +166.38%  (p=0.008 n=5+5)
ValueSetBool-16       29.4ns ± 0%    52.2ns ± 5%   +77.77%  (p=0.016 n=4+5)
ValueSetInt-16        34.0ns ± 1%    74.8ns ± 1%  +119.62%  (p=0.008 n=5+5)
ValueSetProto-16      45.4ns ± 0%    87.8ns ± 1%   +93.57%  (p=0.008 n=5+5)
ValueSetTime-16       52.9ns ± 1%   111.4ns ±18%  +110.67%  (p=0.008 n=5+5)
ValueSetDecimal-16    94.6ns ± 0%   214.2ns ±36%  +126.43%  (p=0.008 n=5+5)
ValueSetTuple-16      38.7ns ± 0%   105.6ns ± 3%  +172.87%  (p=0.008 n=5+5)
ValueGetBytes-16      9.22ns ± 0%   11.60ns ± 0%   +25.84%  (p=0.008 n=5+5)
ValueGetFloat-16      12.0ns ± 0%    23.8ns ± 0%   +97.67%  (p=0.016 n=5+4)
ValueGetBool-16       14.4ns ± 0%    15.2ns ± 0%    +5.56%  (p=0.029 n=4+4)
ValueGetInt-16        13.7ns ± 0%    14.6ns ± 0%    +6.57%  (p=0.016 n=5+4)
ValueGetProto-16      26.1ns ± 0%    22.6ns ± 0%   -13.41%  (p=0.008 n=5+5)
ValueGetTime-16       41.0ns ± 4%    78.9ns ± 0%   +92.68%  (p=0.008 n=5+5)
ValueGetDecimal-16     130ns ±24%     183ns ± 1%   +40.29%  (p=0.008 n=5+5)
ValueGetTuple-16      9.87ns ± 1%   11.60ns ± 0%   +17.58%  (p=0.008 n=5+5)

Ian Lance Taylor

nepřečteno,
15. 2. 2018 19:38:2115.02.18
komu: Tamir Duberstein, golang-nuts
On Thu, Feb 15, 2018 at 11:31 AM, Tamir Duberstein <tam...@gmail.com> wrote:
> What does all do? Anyway, the results are better, but still not "good":

Using "all" applies the options to all packages, not just the one being built.

Thanks for the benchmarks, it's something to look at.

Ian

Tamir Duberstein

nepřečteno,
3. 5. 2018 15:54:1203.05.18
komu: Ian Lance Taylor, golang-nuts
Looks like performance is virtually identical in GCC 8.1:

ValueSetBytes-16      39.1ns ± 1%   104.5ns ± 0%  +167.26%  (p=0.029 n=4+4)
ValueSetFloat-16      25.9ns ± 1%    67.8ns ± 0%  +161.78%  (p=0.029 n=4+4)
ValueSetBool-16       27.5ns ± 0%    54.0ns ± 1%   +96.18%  (p=0.029 n=4+4)
ValueSetInt-16        35.1ns ± 4%    75.0ns ± 0%  +113.51%  (p=0.029 n=4+4)
ValueSetProto-16      45.9ns ± 0%    86.7ns ± 0%   +88.89%  (p=0.029 n=4+4)
ValueSetTime-16       52.7ns ± 1%   100.2ns ± 1%   +90.32%  (p=0.029 n=4+4)
ValueSetDecimal-16    88.9ns ± 1%   177.2ns ± 0%   +99.33%  (p=0.029 n=4+4)
ValueSetTuple-16      39.4ns ± 1%   104.5ns ± 0%  +165.23%  (p=0.029 n=4+4)
ValueGetBytes-16      9.48ns ± 0%   11.62ns ± 1%   +22.66%  (p=0.029 n=4+4)
ValueGetFloat-16      13.3ns ± 0%    23.9ns ± 1%   +79.70%  (p=0.029 n=4+4)
ValueGetBool-16       14.8ns ± 1%    15.2ns ± 0%    +2.36%  (p=0.029 n=4+4)
ValueGetInt-16        14.0ns ± 0%    14.6ns ± 0%    +4.29%  (p=0.029 n=4+4)
ValueGetProto-16      26.4ns ± 0%    22.7ns ± 0%   -14.08%  (p=0.029 n=4+4)
ValueGetTime-16       39.6ns ± 0%    79.0ns ± 0%   +99.24%  (p=0.029 n=4+4)
ValueGetDecimal-16     101ns ± 3%     182ns ± 0%   +80.04%  (p=0.029 n=4+4)
ValueGetTuple-16      9.14ns ± 0%   11.60ns ± 0%   +26.85%  (p=0.029 n=4+4)
Odpovědět všem
Odpověď autorovi
Přeposlat
0 nových zpráv