code review 6127064: compress/flate: optimize history-copy decoding. (issue 6127064)

58 views
Skip to first unread message

nige...@golang.org

unread,
Apr 30, 2012, 1:11:42 AM4/30/12
to r...@golang.org, da...@cheney.net, golan...@googlegroups.com, kra...@golang.org, re...@codereview-hr.appspotmail.com
Reviewers: rsc,

Message:
Hello r...@golang.org (cc: da...@cheney.net, golan...@googlegroups.com,
kra...@golang.org),

I'd like you to review this change to
https://go.googlecode.com/hg/


Description:
compress/flate: optimize history-copy decoding.

The naiveCopy function could be re-written in asm, and the copyHuff
method could probably be rolled into huffmanBlock and copyHist, but
I'm leaving those changes for future CLs.

compress/flate benchmarks:
benchmark old ns/op new ns/op
delta
BenchmarkDecoderBestSpeed1K 385327 431482
+11.98%
BenchmarkDecoderBestSpeed10K 1245190 1062026
-14.71%
BenchmarkDecoderBestSpeed100K 8512365 5838158
-31.42%
BenchmarkDecoderDefaultCompression1K 382225 422697
+10.59%
BenchmarkDecoderDefaultCompression10K 867950 615917
-29.04%
BenchmarkDecoderDefaultCompression100K 5658240 2477376
-56.22%
BenchmarkDecoderBestCompression1K 383760 422538
+10.10%
BenchmarkDecoderBestCompression10K 867743 615757
-29.04%
BenchmarkDecoderBestCompression100K 5660160 2478030
-56.22%

image/png benchmarks:
benchmark old ns/op new ns/op delta
BenchmarkDecodeGray 2540834 2462029 -3.10%
BenchmarkDecodeNRGBAGradient 10052700 9781590 -2.70%
BenchmarkDecodeNRGBAOpaque 8704710 8371630 -3.83%
BenchmarkDecodePaletted 1458779 1396961 -4.24%
BenchmarkDecodeRGB 7183606 6951584 -3.23%

Wall time for Denis Cheremisov's PNG-decoding program given in
https://groups.google.com/group/golang-nuts/browse_thread/thread/22aa8a05040fdd49
Before: 3.07s
After: 2.55s
Delta: -17%

Before profile:
Total: 304 samples
159 52.3% 52.3% 251 82.6%
compress/flate.(*decompressor).huffmanBlock
58 19.1% 71.4% 76 25.0%
compress/flate.(*decompressor).huffSym
32 10.5% 81.9% 32 10.5% hash/adler32.update
16 5.3% 87.2% 22 7.2% bufio.(*Reader).ReadByte
16 5.3% 92.4% 37 12.2%
compress/flate.(*decompressor).moreBits
7 2.3% 94.7% 7 2.3% hash/crc32.update
7 2.3% 97.0% 7 2.3% runtime.memmove
5 1.6% 98.7% 5 1.6% scanblock
2 0.7% 99.3% 9 3.0% runtime.copy
1 0.3% 99.7% 1 0.3%
compress/flate.(*huffmanDecoder).init

After profile:
Total: 253 samples
75 29.6% 29.6% 88 34.8%
compress/flate.(*decompressor).huffSym
55 21.7% 51.4% 55 21.7% compress/flate.naiveCopy
30 11.9% 63.2% 30 11.9% hash/adler32.update
24 9.5% 72.7% 194 76.7%
compress/flate.(*decompressor).huffmanBlock
17 6.7% 79.4% 33 13.0%
compress/flate.(*decompressor).moreBits
11 4.3% 83.8% 17 6.7% bufio.(*Reader).ReadByte
10 4.0% 87.7% 10 4.0% runtime.memmove
9 3.6% 91.3% 67 26.5%
compress/flate.(*decompressor).copyHist
7 2.8% 94.1% 7 2.8% hash/crc32.update
7 2.8% 96.8% 7 2.8% scanblock

Please review this at http://codereview.appspot.com/6127064/

Affected files:
A src/pkg/compress/flate/copy.go
A src/pkg/compress/flate/copy_test.go
M src/pkg/compress/flate/inflate.go


rogp...@gmail.com

unread,
Apr 30, 2012, 8:04:31 AM4/30/12
to nige...@golang.org, r...@golang.org, da...@cheney.net, golan...@googlegroups.com, kra...@golang.org, re...@codereview-hr.appspotmail.com

http://codereview.appspot.com/6127064/diff/4001/src/pkg/compress/flate/copy.go
File src/pkg/compress/flate/copy.go (right):

http://codereview.appspot.com/6127064/diff/4001/src/pkg/compress/flate/copy.go#newcode13
src/pkg/compress/flate/copy.go:13: for i := range dst {
it seems to me that it might be slightly faster if you saved a bounds
check by doing this:
for i, b := range src[:len(dst)] {
dst[i] = b
}

http://codereview.appspot.com/6127064/

da...@cheney.net

unread,
Apr 30, 2012, 8:20:36 AM4/30/12
to nige...@golang.org, r...@golang.org, rogp...@gmail.com, golan...@googlegroups.com, kra...@golang.org, re...@codereview-hr.appspotmail.com
> for i, b := range src[:len(dst)] {
> dst[i] = b
> }

nice catch.


http://codereview.appspot.com/6127064/

r...@golang.org

unread,
Apr 30, 2012, 4:38:02 PM4/30/12
to nige...@golang.org, rogp...@gmail.com, da...@cheney.net, golan...@googlegroups.com, kra...@golang.org, re...@codereview-hr.appspotmail.com
LGTM
On 2012/04/30 12:04:31, rog wrote:
> it seems to me that it might be slightly faster if you saved a bounds
check by
> doing this:
> for i, b := range src[:len(dst)] {
> dst[i] = b
> }

This is true.

Is it worth picking off the case where there is no overlap and
invoking the real copy?

http://codereview.appspot.com/6127064/diff/4001/src/pkg/compress/flate/inflate.go
File src/pkg/compress/flate/inflate.go (right):

http://codereview.appspot.com/6127064/diff/4001/src/pkg/compress/flate/inflate.go#newcode517
src/pkg/compress/flate/inflate.go:517: // It returns whether the f.hist
buffer is full.
s/returns/reports/

http://codereview.appspot.com/6127064/

Nigel Tao

unread,
Apr 30, 2012, 8:17:00 PM4/30/12
to nige...@golang.org, r...@golang.org, rogp...@gmail.com, da...@cheney.net, golan...@googlegroups.com, kra...@golang.org, re...@codereview-hr.appspotmail.com
On second thoughts, I've renamed naiveCopy to forwardCopy.

Submitting...


On 30 April 2012 22:04, <rogp...@gmail.com> wrote:
> it seems to me that it might be slightly faster if you saved a bounds
> check by doing this:
> for i, b := range src[:len(dst)] {
>   dst[i] = b
> }

Indeed. The compress/flate microbenchmarks (which decode the
compressed digits of e) don't show much difference, but the image/png
microbenchmarks show a further 3% gain, and Denis Cheremisov's program
is now down to 2.32s (compared to 3.07s before this CL, and 2.55s with
the first cut of this CL). I've updated this change description with
newer numbers.


On 1 May 2012 06:38, <r...@golang.org> wrote:
> Is it worth picking off the case where there is no overlap and
> invoking the real copy?

The extra complexity doesn't seem to help: "go test compress/flate"
goes from 1.89s to 1.99s, and Denis Cheremisov's program shows no
significant difference. For the record, my code was:

if len(dst) == 0 || len(src) == 0 {
return 0
}
// Check if the two slices are backed by the same array,
// by comparing the address of the cap'th element.
cd, cs := cap(dst), cap(src)
if d, s := dst[:cd], src[:cs]; &d[cd-1] == &s[cs-1] {
// Do a manual copy if they overlap.
d0 := len(d)
d1 := d0 - len(dst)
s0 := len(s)
s1 := s0 - len(src)
// Note that d0 >= d1 and s0 >= s1, so that if d0 <= s1, then we have an
// ordering s0 >= s1 >= d0 >= d1, and the two slices do not overlap.
// Similarly, if s0 <= d1, then the two slices do not overlap.
if d0 > s1 && s0 > d1 {
if len(src) > len(dst) {
src = src[:len(dst)]
}
for i, x := range src {
dst[i] = x
}
return len(src)
}
}
return copy(dst, src)

nige...@golang.org

unread,
Apr 30, 2012, 8:51:44 PM4/30/12
to nige...@golang.org, r...@golang.org, rogp...@gmail.com, da...@cheney.net, golan...@googlegroups.com, kra...@golang.org, re...@codereview-hr.appspotmail.com
*** Submitted as
http://code.google.com/p/go/source/detail?r=3a8932ef3669 ***

compress/flate: optimize history-copy decoding.

The forwardCopy function could be re-written in asm, and the copyHuff
method could probably be rolled into huffmanBlock and copyHist, but
I'm leaving those changes for future CLs.

compress/flate benchmarks:
benchmark old ns/op new ns/op
delta
BenchmarkDecoderBestSpeed1K 385327 435140
+12.93%
BenchmarkDecoderBestSpeed10K 1245190 1062112
-14.70%
BenchmarkDecoderBestSpeed100K 8512365 5833680
-31.47%
BenchmarkDecoderDefaultCompression1K 382225 421301
+10.22%
BenchmarkDecoderDefaultCompression10K 867950 613890
-29.27%
BenchmarkDecoderDefaultCompression100K 5658240 2466726
-56.40%
BenchmarkDecoderBestCompression1K 383760 421634
+9.87%
BenchmarkDecoderBestCompression10K 867743 614671
-29.16%
BenchmarkDecoderBestCompression100K 5660160 2464996
-56.45%

image/png benchmarks:
benchmark old ns/op new ns/op delta
BenchmarkDecodeGray 2540834 2389624 -5.95%
BenchmarkDecodeNRGBAGradient 10052700 9534565 -5.15%
BenchmarkDecodeNRGBAOpaque 8704710 8163430 -6.22%
BenchmarkDecodePaletted 1458779 1325017 -9.17%
BenchmarkDecodeRGB 7183606 6794668 -5.41%

Wall time for Denis Cheremisov's PNG-decoding program given in
https://groups.google.com/group/golang-nuts/browse_thread/thread/22aa8a05040fdd49
Before: 3.07s
After: 2.32s
Delta: -24%

Before profile:
Total: 304 samples
159 52.3% 52.3% 251 82.6%
compress/flate.(*decompressor).huffmanBlock
58 19.1% 71.4% 76 25.0%
compress/flate.(*decompressor).huffSym
32 10.5% 81.9% 32 10.5% hash/adler32.update
16 5.3% 87.2% 22 7.2% bufio.(*Reader).ReadByte
16 5.3% 92.4% 37 12.2%
compress/flate.(*decompressor).moreBits
7 2.3% 94.7% 7 2.3% hash/crc32.update
7 2.3% 97.0% 7 2.3% runtime.memmove
5 1.6% 98.7% 5 1.6% scanblock
2 0.7% 99.3% 9 3.0% runtime.copy
1 0.3% 99.7% 1 0.3%
compress/flate.(*huffmanDecoder).init

After profile:
Total: 230 samples
59 25.7% 25.7% 70 30.4%
compress/flate.(*decompressor).huffSym
45 19.6% 45.2% 45 19.6% hash/adler32.update
35 15.2% 60.4% 35 15.2% compress/flate.forwardCopy
20 8.7% 69.1% 151 65.7%
compress/flate.(*decompressor).huffmanBlock
16 7.0% 76.1% 24 10.4%
compress/flate.(*decompressor).moreBits
15 6.5% 82.6% 15 6.5% runtime.memmove
11 4.8% 87.4% 50 21.7%
compress/flate.(*decompressor).copyHist
7 3.0% 90.4% 7 3.0% hash/crc32.update
6 2.6% 93.0% 9 3.9% bufio.(*Reader).ReadByte
4 1.7% 94.8% 4 1.7% runtime.slicearray

R=rsc, rogpeppe, dave
CC=golang-dev, krasin
http://codereview.appspot.com/6127064


http://codereview.appspot.com/6127064/

Nigel Tao

unread,
May 1, 2012, 12:33:47 AM5/1/12
to nige...@golang.org, r...@golang.org, rogp...@gmail.com, da...@cheney.net, golan...@googlegroups.com, kra...@golang.org, re...@codereview-hr.appspotmail.com
On 1 May 2012 10:51, <nige...@golang.org> wrote:
> The forwardCopy function could be re-written in asm,

For the curious, I tried re-writing compress/flate's forwardCopy in
asm. It looks like a win for very long, highly flate-compressed
inputs, but it's not as clear otherwise (such as decoding PNG files).
I don't know if it's worth submitting.

Code and benchmarks are at
http://codereview.appspot.com/6137062

Also, I'm not really fluent in amd64 asm, so I might have done something dumb.
Reply all
Reply to author
Forward
0 new messages