Hi!
I am seeing a generally negative impact of Go 1.13 -> 1.14-RC1 in terms of speed.
Running benchmarks in my deflate package - and removing the "no change" entries:
name old time/op new time/op delta
DecodeDigitsSpeed1e5-12 903µs ± 0% 940µs ± 1% +4.14% (p=0.008 n=5+5)
DecodeDigitsSpeed1e6-12 8.97ms ± 0% 9.40ms ± 1% +4.80% (p=0.008 n=5+5)
DecodeDigitsDefault1e4-12 93.2µs ± 0% 95.0µs ± 1% +1.97% (p=0.008 n=5+5)
DecodeDigitsDefault1e5-12 855µs ± 0% 882µs ± 2% +3.15% (p=0.008 n=5+5)
DecodeDigitsDefault1e6-12 8.58ms ± 0% 8.94ms ± 2% +4.28% (p=0.008 n=5+5)
DecodeDigitsCompress1e4-12 93.3µs ± 0% 94.6µs ± 1% +1.37% (p=0.016 n=4+5)
DecodeDigitsCompress1e5-12 976µs ± 0% 992µs ± 1% +1.60% (p=0.008 n=5+5)
DecodeDigitsCompress1e6-12 9.85ms ± 0% 9.97ms ± 1% +1.21% (p=0.016 n=4+5)
DecodeTwainSpeed1e4-12 93.7µs ± 0% 98.0µs ± 2% +4.60% (p=0.008 n=5+5)
DecodeTwainSpeed1e5-12 896µs ± 0% 902µs ± 0% +0.68% (p=0.008 n=5+5)
DecodeTwainDefault1e4-12 93.0µs ± 0% 95.1µs ± 1% +2.32% (p=0.008 n=5+5)
DecodeTwainDefault1e5-12 832µs ± 0% 840µs ± 0% +0.88% (p=0.008 n=5+5)
DecodeTwainDefault1e6-12 8.17ms ± 0% 8.22ms ± 0% +0.68% (p=0.008 n=5+5)
DecodeTwainCompress1e4-12 90.4µs ± 1% 93.1µs ± 1% +2.99% (p=0.008 n=5+5)
DecodeTwainCompress1e5-12 790µs ± 0% 802µs ± 0% +1.55% (p=0.008 n=5+5)
DecodeRandomSpeed1e4-12 288ns ± 2% 305ns ± 1% +5.91% (p=0.008 n=5+5)
DecodeRandomSpeed1e5-12 2.30µs ± 2% 2.24µs ± 1% -2.40% (p=0.008 n=5+5)
_tokens_EstimatedBits-12 651ns ± 0% 707ns ± 2% +8.67% (p=0.008 n=5+5)
EncodeDigitsConstant1e4-12 28.4µs ± 0% 29.4µs ± 0% +3.41% (p=0.016 n=5+4)
EncodeDigitsConstant1e5-12 307µs ± 0% 314µs ± 2% +2.41% (p=0.008 n=5+5)
EncodeDigitsConstant1e6-12 2.70ms ± 0% 2.77ms ± 1% +2.47% (p=0.008 n=5+5)
EncodeDigitsSpeed1e5-12 966µs ± 0% 988µs ± 0% +2.34% (p=0.008 n=5+5)
EncodeDigitsSpeed1e6-12 9.07ms ± 1% 9.22ms ± 1% +1.67% (p=0.032 n=5+5)
EncodeDigitsDefault1e5-12 1.63ms ± 0% 1.65ms ± 1% +1.17% (p=0.008 n=5+5)
EncodeDigitsCompress1e5-12 3.70ms ± 1% 3.64ms ± 1% -1.65% (p=0.008 n=5+5)
EncodeDigitsCompress1e6-12 40.1ms ± 0% 39.4ms ± 2% -1.61% (p=0.008 n=5+5)
EncodeDigitsSL1e5-12 955µs ± 0% 992µs ± 1% +3.79% (p=0.008 n=5+5)
EncodeDigitsSL1e6-12 9.34ms ± 0% 9.99ms ± 1% +6.92% (p=0.008 n=5+5)
EncodeTwainConstant1e4-12 37.6µs ± 2% 38.9µs ± 2% +3.51% (p=0.008 n=5+5)
EncodeTwainConstant1e5-12 337µs ± 0% 345µs ± 1% +2.38% (p=0.008 n=5+5)
EncodeTwainSpeed1e4-12 101µs ± 0% 102µs ± 0% +0.62% (p=0.024 n=5+5)
EncodeTwainSpeed1e5-12 955µs ± 0% 968µs ± 1% +1.35% (p=0.008 n=5+5)
EncodeTwainSpeed1e6-12 8.92ms ± 1% 9.09ms ± 1% +1.94% (p=0.032 n=5+5)
EncodeTwainDefault1e4-12 152µs ± 1% 160µs ± 1% +4.69% (p=0.008 n=5+5)
EncodeTwainDefault1e5-12 1.44ms ± 1% 1.49ms ± 1% +3.69% (p=0.008 n=5+5)
EncodeTwainDefault1e6-12 13.7ms ± 1% 14.2ms ± 2% +3.43% (p=0.008 n=5+5)
EncodeTwainCompress1e4-12 267µs ± 1% 272µs ± 2% +1.97% (p=0.008 n=5+5)
EncodeTwainCompress1e5-12 4.76ms ± 0% 4.81ms ± 0% +1.11% (p=0.008 n=5+5)
EncodeTwainCompress1e6-12 52.4ms ± 0% 53.0ms ± 1% +1.04% (p=0.008 n=5+5)
EncodeTwainSL1e4-12 101µs ± 1% 105µs ± 1% +4.48% (p=0.008 n=5+5)
EncodeTwainSL1e5-12 925µs ± 1% 949µs ± 1% +2.59% (p=0.008 n=5+5)
EncodeTwainSL1e6-12 8.86ms ± 1% 9.24ms ± 0% +4.28% (p=0.008 n=5+5)
`_tokens_EstimatedBits` is a microbenchmark and will probably be easier to identify. I will add an issue for that.
Running benchmarks on my zstd package gives a less clear, but still trending towards a performance loss:
name old time/op new time/op delta
Decoder_DecoderSmall/kppkn.gtb.zst-12 5.76ms ± 1% 5.87ms ± 2% +1.98% (p=0.016 n=5+5)
Decoder_DecoderSmall/geo.protodata.zst-12 1.53ms ± 0% 1.62ms ± 1% +5.86% (p=0.008 n=5+5)
Decoder_DecoderSmall/plrabn12.txt.zst-12 19.1ms ± 0% 18.7ms ± 1% -2.25% (p=0.008 n=5+5)
Decoder_DecoderSmall/lcet10.txt.zst-12 14.4ms ± 1% 13.6ms ± 0% -5.65% (p=0.008 n=5+5)
Decoder_DecoderSmall/html_x_4.zst-12 2.94ms ± 2% 3.00ms ± 0% +2.21% (p=0.008 n=5+5)
Decoder_DecoderSmall/paper-100k.pdf.zst-12 473µs ± 1% 511µs ± 1% +7.94% (p=0.008 n=5+5)
Decoder_DecoderSmall/fireworks.jpeg.zst-12 485µs ± 2% 511µs ± 4% +5.29% (p=0.008 n=5+5)
Decoder_DecoderSmall/html.zst-12 1.65ms ± 1% 1.71ms ± 1% +4.01% (p=0.008 n=5+5)
Decoder_DecoderSmall/comp-data.bin.zst-12 191µs ± 1% 206µs ± 1% +7.70% (p=0.008 n=5+5)
Decoder_DecodeAll/plrabn12.txt.zst-12 2.21ms ± 1% 2.19ms ± 1% -0.95% (p=0.032 n=5+5)
Decoder_DecodeAll/lcet10.txt.zst-12 1.63ms ± 1% 1.65ms ± 0% +1.20% (p=0.008 n=5+5)
Decoder_DecodeAll/alice29.txt.zst-12 726µs ± 0% 741µs ± 1% +2.06% (p=0.008 n=5+5)
Decoder_DecodeAll/paper-100k.pdf.zst-12 26.2µs ± 2% 28.3µs ± 3% +8.14% (p=0.008 n=5+5)
Decoder_DecodeAll/comp-data.bin.zst-12 11.7µs ± 1% 12.1µs ± 3% +3.21% (p=0.016 n=5+5)
Encoder_EncodeAllSimple/default-12 496µs ± 1% 491µs ± 1% -0.85% (p=0.008 n=5+5)
Encoder_EncodeAllSimple4K/fastest-12 28.6µs ± 1% 29.1µs ± 1% +1.75% (p=0.008 n=5+5)
RandomEncodeAllDefault-12 4.73ms ± 1% 4.82ms ± 1% +1.82% (p=0.016 n=5+5)
RandomEncoderFastest-12 4.30ms ± 3% 4.24ms ± 1% -1.34% (p=0.032 n=5+5)
Snappy_ConvertXML-12 14.1ms ± 0% 13.9ms ± 0% -1.55% (p=0.008 n=5+5)
But there are some quite big regressions in there, definitely not what I expected.
Finally, the S2 benchmark assembly disabled is seeing some variance, some rather big losses (mostly encoding) and some rather big wins (mostly decoding, which is mostly memcopy).
name old time/op new time/op delta
DecodeS2Block/0-html/block-better-12 13.9µs ± 1% 13.6µs ± 0% -2.45% (p=0.016 n=5+4)
DecodeS2Block/2-jpg/block-better-12 1.27µs ± 0% 1.24µs ± 1% -1.78% (p=0.008 n=5+5)
DecodeS2Block/4-pdf/block-better-12 2.72µs ± 2% 2.65µs ± 1% -2.72% (p=0.008 n=5+5)
DecodeS2Block/5-html4/block-better-12 40.2µs ± 4% 37.7µs ± 2% -6.20% (p=0.016 n=5+5)
DecodeS2Block/6-txt1/block-better-12 60.4µs ± 1% 63.0µs ± 2% +4.26% (p=0.008 n=5+5)
DecodeS2Block/8-txt3/block-better-12 162µs ± 5% 155µs ± 0% -4.65% (p=0.008 n=5+5)
DecodeS2Block/9-txt4/block-12 161µs ± 1% 157µs ± 0% -2.62% (p=0.008 n=5+5)
DecodeS2Block/9-txt4/block-better-12 224µs ± 1% 216µs ± 1% -3.55% (p=0.008 n=5+5)
DecodeS2Block/10-pb/block-12 10.7µs ± 0% 10.6µs ± 1% -1.43% (p=0.008 n=5+5)
DecodeS2Block/10-pb/block-better-12 12.1µs ± 0% 11.8µs ± 0% -1.90% (p=0.008 n=5+5)
DecodeS2Block/11-gaviota/block-12 51.9µs ± 1% 51.2µs ± 0% -1.36% (p=0.032 n=5+5)
DecodeS2Block/12-txt1_128b/block-12 18.0ns ± 1% 17.6ns ± 0% -2.22% (p=0.000 n=5+4)
DecodeS2Block/12-txt1_128b/block-better-12 18.3ns ± 1% 17.7ns ± 0% -3.07% (p=0.016 n=5+4)
DecodeS2Block/13-txt1_1000b/block-12 73.1ns ± 2% 70.2ns ± 0% -4.02% (p=0.008 n=5+5)
DecodeS2Block/13-txt1_1000b/block-better-12 183ns ± 5% 174ns ± 1% -4.92% (p=0.008 n=5+5)
DecodeS2Block/14-txt1_10000b/block-12 1.45µs ± 1% 1.40µs ± 0% -3.08% (p=0.008 n=5+5)
DecodeS2Block/14-txt1_10000b/block-better-12 3.48µs ± 1% 3.68µs ± 0% +5.86% (p=0.008 n=5+5)
DecodeS2Block/15-txt1_20000b/block-12 4.36µs ± 0% 4.54µs ± 0% +4.08% (p=0.008 n=5+5)
DecodeS2Block/15-txt1_20000b/block-better-12 8.54µs ± 0% 8.27µs ± 0% -3.20% (p=0.008 n=5+5)
EncodeS2Block/0-html/block-12 26.6µs ± 0% 26.4µs ± 0% -0.65% (p=0.008 n=5+5)
EncodeS2Block/0-html/block-better-12 46.3µs ± 1% 46.6µs ± 0% +0.71% (p=0.016 n=5+5)
EncodeS2Block/1-urls/block-better-12 567µs ± 0% 573µs ± 0% +1.00% (p=0.008 n=5+5)
EncodeS2Block/2-jpg/block-better-12 4.61µs ± 3% 4.44µs ± 1% -3.76% (p=0.008 n=5+5)
EncodeS2Block/3-jpg_200b/block-12 282ns ± 2% 274ns ± 0% -2.83% (p=0.008 n=5+5)
EncodeS2Block/5-html4/block-better-12 59.1µs ± 1% 61.1µs ± 0% +3.40% (p=0.008 n=5+5)
EncodeS2Block/6-txt1/block-12 96.2µs ± 0% 97.1µs ± 0% +0.98% (p=0.008 n=5+5)
EncodeS2Block/6-txt1/block-better-12 165µs ± 0% 169µs ± 1% +2.44% (p=0.016 n=4+5)
EncodeS2Block/7-txt2/block-12 80.7µs ± 0% 81.9µs ± 1% +1.55% (p=0.016 n=4+5)
EncodeS2Block/7-txt2/block-better-12 148µs ± 0% 152µs ± 1% +2.26% (p=0.016 n=5+4)
EncodeS2Block/8-txt3/block-12 270µs ± 0% 276µs ± 1% +1.96% (p=0.008 n=5+5)
EncodeS2Block/8-txt3/block-better-12 434µs ± 0% 441µs ± 0% +1.72% (p=0.016 n=4+5)
EncodeS2Block/9-txt4/block-12 356µs ± 2% 346µs ± 1% -2.84% (p=0.008 n=5+5)
EncodeS2Block/10-pb/block-better-12 40.2µs ± 0% 41.2µs ± 2% +2.49% (p=0.008 n=5+5)
EncodeS2Block/11-gaviota/block-12 80.7µs ± 0% 81.7µs ± 0% +1.30% (p=0.008 n=5+5)
EncodeS2Block/11-gaviota/block-better-12 128µs ± 0% 131µs ± 6% +2.45% (p=0.008 n=5+5)
EncodeS2Block/13-txt1_1000b/block-12 607ns ± 2% 598ns ± 0% -1.52% (p=0.024 n=5+5)
EncodeS2Block/14-txt1_10000b/block-12 5.49µs ± 2% 5.06µs ± 1% -7.79% (p=0.008 n=5+5)
EncodeS2Block/15-txt1_20000b/block-12 12.3µs ± 1% 12.9µs ± 4% +4.84% (p=0.016 n=5+5)
EncodeS2Block/15-txt1_20000b/block-better-12 27.2µs ± 0% 28.7µs ± 2% +5.40% (p=0.008 n=5+5)
DecodeSnappyBlock/0-html/s2-snappy-12 17.0µs ± 0% 16.8µs ± 0% -1.09% (p=0.008 n=5+5)
DecodeSnappyBlock/1-urls/s2-snappy-12 197µs ± 1% 199µs ± 1% +1.04% (p=0.032 n=5+5)
DecodeSnappyBlock/2-jpg/s2-snappy-12 1.26µs ± 1% 1.25µs ± 1% -1.08% (p=0.032 n=5+5)
DecodeSnappyBlock/3-jpg_200b/snappy-12 29.7ns ± 9% 26.9ns ± 0% -9.42% (p=0.008 n=5+5)
DecodeSnappyBlock/3-jpg_200b/s2-snappy-12 28.0ns ± 2% 26.8ns ± 1% -4.07% (p=0.008 n=5+5)
DecodeSnappyBlock/5-html4/s2-snappy-12 70.5µs ± 2% 68.2µs ± 1% -3.26% (p=0.016 n=4+5)
These are microbenchmarks which tends to over-emphasize differences, so I would say overall it looks like a 2% loss.