[go] compress/flate: improve decompression speed

80 views
Skip to first unread message

Gerrit Bot (Gerrit)

unread,
Apr 9, 2020, 7:09:46 AM4/9/20
to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Gerrit Bot would like Klaus Post to review this change.

View Change

compress/flate: improve decompression speed

Improve inflate decompression speed, mainly through 3 optimizations:

1) Read further ahead on non-final blocks.

The reader guarantees that it will not read beyond the end of the stream.
This poses limitations on the decoder in terms of how far it can read ahead
and is set to the size of an end-of-block marker in `f.h1.min = f.bits[endBlockMarker]`.

We can however take advantage of the fact that each block gives
information on whether it is the final block on a stream.
So if we are not reading the final block we can safely add the size
of the smallest block possible with nothing but an EOB marker.

That is a block with a predefined table and a single EOB marker.
Since we know the size of the block header and the encoding
of the EOB this totals to 10 additional bits.
Adding 10 bits reduces the number of stream reads significantly.

Approximately 5% throughput increase.

2) Manually inline f.huffSym call

This change by itself give about about 13% throughput increase.

3) Generate decoders for stdlib io.ByteReader types

We generate decoders for the known implementations of `io.ByteReader`,
namely `*bytes.Buffer`, `*bytes.Reader`, `*bufio.Reader` and `*strings.Reader`.

This change by itself gives about 20-25% throughput increase,
including when an `io.Reader` is passed.

I would say only `*strings.Reader` probably isn't that common.

Minor changes:

* Reuse `h.chunks` and `h.links`.
* Trade some bounds checks for AND operations.
* Change chunks from uint32 to uint16.
* Avoid padding of decompressor struct members.

Per loop allocation removed from benchmarks.
The numbers in the benchmark below includes this change for the 'old' numbers.

```
name old time/op new time/op delta
Decode/Digits/Huffman/1e4-32 78.0µs ± 0% 50.5µs ± 1% -35.26% (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32 779µs ± 2% 487µs ± 0% -37.48% (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32 7.68ms ± 0% 4.88ms ± 1% -36.44% (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32 88.5µs ± 1% 59.9µs ± 1% -32.33% (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32 963µs ± 1% 678µs ± 1% -29.58% (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32 9.75ms ± 1% 6.90ms ± 0% -29.21% (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32 91.2µs ± 1% 61.4µs ± 0% -32.72% (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32 954µs ± 0% 675µs ± 0% -29.25% (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32 9.67ms ± 0% 6.79ms ± 1% -29.76% (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32 90.7µs ± 1% 61.5µs ± 1% -32.21% (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32 953µs ± 1% 672µs ± 0% -29.46% (p=0.016 n=4+5)
Decode/Digits/Compression/1e6-32 9.76ms ± 4% 6.78ms ± 0% -30.54% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32 90.4µs ± 0% 54.7µs ± 0% -39.52% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32 885µs ± 0% 538µs ± 0% -39.19% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32 8.84ms ± 0% 5.44ms ± 0% -38.46% (p=0.016 n=4+5)
Decode/Newton/Speed/1e4-32 81.5µs ± 0% 55.1µs ± 1% -32.42% (p=0.016 n=4+5)
Decode/Newton/Speed/1e5-32 751µs ± 4% 528µs ± 0% -29.70% (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32 7.49ms ± 2% 5.32ms ± 0% -28.92% (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32 73.3µs ± 1% 48.9µs ± 1% -33.36% (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32 601µs ± 2% 418µs ± 0% -30.40% (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32 5.92ms ± 0% 4.17ms ± 0% -29.60% (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32 72.7µs ± 0% 48.5µs ± 0% -33.21% (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32 597µs ± 0% 418µs ± 0% -29.90% (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32 5.90ms ± 0% 4.15ms ± 0% -29.63% (p=0.016 n=4+5)

name old speed new speed delta
Decode/Digits/Huffman/1e4-32 128MB/s ± 0% 198MB/s ± 1% +54.46% (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32 128MB/s ± 2% 205MB/s ± 0% +59.92% (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32 130MB/s ± 0% 205MB/s ± 1% +57.33% (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32 113MB/s ± 1% 167MB/s ± 1% +47.79% (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32 104MB/s ± 1% 147MB/s ± 1% +42.01% (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32 103MB/s ± 1% 145MB/s ± 0% +41.26% (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32 110MB/s ± 1% 163MB/s ± 0% +48.63% (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32 105MB/s ± 0% 148MB/s ± 0% +41.34% (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32 103MB/s ± 0% 147MB/s ± 1% +42.37% (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32 110MB/s ± 1% 163MB/s ± 1% +47.51% (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32 105MB/s ± 1% 149MB/s ± 0% +41.77% (p=0.016 n=4+5)
Decode/Digits/Compression/1e6-32 102MB/s ± 4% 147MB/s ± 0% +43.91% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32 111MB/s ± 0% 183MB/s ± 0% +65.35% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32 113MB/s ± 0% 186MB/s ± 0% +64.44% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32 113MB/s ± 0% 184MB/s ± 0% +62.50% (p=0.016 n=4+5)
Decode/Newton/Speed/1e4-32 123MB/s ± 0% 182MB/s ± 1% +47.98% (p=0.016 n=4+5)
Decode/Newton/Speed/1e5-32 133MB/s ± 4% 189MB/s ± 0% +42.20% (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32 134MB/s ± 2% 188MB/s ± 0% +40.67% (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32 136MB/s ± 1% 205MB/s ± 1% +50.05% (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32 166MB/s ± 2% 239MB/s ± 0% +43.67% (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32 169MB/s ± 0% 240MB/s ± 0% +42.04% (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32 138MB/s ± 0% 206MB/s ± 0% +49.73% (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32 168MB/s ± 0% 239MB/s ± 0% +42.66% (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32 170MB/s ± 0% 241MB/s ± 0% +42.11% (p=0.016 n=4+5)

name old alloc/op new alloc/op delta
Decode/Digits/Huffman/1e4-32 0.00B ±NaN% 16.00B ± 0% +Inf% (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32 7.60B ± 8% 32.00B ± 0% +321.05% (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32 79.6B ± 1% 264.0B ± 0% +231.66% (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32 80.0B ± 0% 16.0B ± 0% -80.00% (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32 297B ± 0% 33B ± 0% ~ (p=0.079 n=4+5)
Decode/Digits/Speed/1e6-32 3.86kB ± 0% 0.27kB ± 0% -92.98% (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32 48.0B ± 0% 16.0B ± 0% -66.67% (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32 297B ± 0% 49B ± 0% -83.50% (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32 4.28kB ± 0% 0.38kB ± 0% ~ (p=0.079 n=4+5)
Decode/Digits/Compression/1e4-32 48.0B ± 0% 16.0B ± 0% -66.67% (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32 297B ± 0% 49B ± 0% ~ (p=0.079 n=4+5)
Decode/Digits/Compression/1e6-32 4.28kB ± 0% 0.38kB ± 0% -91.09% (p=0.000 n=4+5)
Decode/Newton/Huffman/1e4-32 705B ± 0% 16B ± 0% -97.73% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32 4.50kB ± 0% 0.03kB ± 0% -99.27% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32 39.4kB ± 0% 0.3kB ± 0% -99.29% (p=0.008 n=5+5)
Decode/Newton/Speed/1e4-32 625B ± 0% 16B ± 0% -97.44% (p=0.008 n=5+5)
Decode/Newton/Speed/1e5-32 3.21kB ± 0% 0.03kB ± 0% -98.97% (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32 40.6kB ± 0% 0.3kB ± 0% -99.25% (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32 513B ± 0% 16B ± 0% -96.88% (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32 2.37kB ± 0% 0.03kB ± 0% -98.61% (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32 21.2kB ± 0% 0.2kB ± 0% -98.97% (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32 513B ± 0% 16B ± 0% -96.88% (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32 2.37kB ± 0% 0.03kB ± 0% -98.61% (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32 23.0kB ± 0% 0.2kB ± 0% -99.07% (p=0.008 n=5+5)

name old allocs/op new allocs/op delta
Decode/Digits/Huffman/1e4-32 0.00 ±NaN% 1.00 ± 0% +Inf% (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32 0.00 ±NaN% 2.00 ± 0% +Inf% (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32 0.00 ±NaN% 16.00 ± 0% +Inf% (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32 6.00 ± 0% 2.00 ± 0% -66.67% (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32 68.0 ± 0% 16.0 ± 0% -76.47% (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32 8.00 ± 0% 3.00 ± 0% -62.50% (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32 74.0 ± 0% 23.0 ± 0% -68.92% (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32 8.00 ± 0% 3.00 ± 0% -62.50% (p=0.008 n=5+5)
Decode/Digits/Compression/1e6-32 74.0 ± 0% 23.0 ± 0% -68.92% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32 9.00 ± 0% 1.00 ± 0% -88.89% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32 18.0 ± 0% 2.0 ± 0% -88.89% (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32 156 ± 0% 16 ± 0% -89.74% (p=0.008 n=5+5)
Decode/Newton/Speed/1e4-32 13.0 ± 0% 1.0 ± 0% -92.31% (p=0.008 n=5+5)
Decode/Newton/Speed/1e5-32 26.0 ± 0% 2.0 ± 0% -92.31% (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32 223 ± 0% 16 ± 0% -92.83% (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32 10.0 ± 0% 1.0 ± 0% -90.00% (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32 27.0 ± 0% 2.0 ± 0% -92.59% (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32 153 ± 0% 12 ± 0% -92.16% (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32 10.0 ± 0% 1.0 ± 0% -90.00% (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32 27.0 ± 0% 2.0 ± 0% -92.59% (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32 145 ± 0% 12 ± 0% -91.72% (p=0.008 n=5+5)
```

These changes have been included in github.com/klauspost/compress
for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: 6180f3c8841bad9184e2d9d43c6829eb9b7a63e5
GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go
4 files changed, 1,220 insertions(+), 46 deletions(-)

diff --git a/src/compress/flate/gen_inflate.go b/src/compress/flate/gen_inflate.go
new file mode 100644
index 0000000..9db11f3
--- /dev/null
+++ b/src/compress/flate/gen_inflate.go
@@ -0,0 +1,259 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package flate implements the DEFLATE compressed data format, described in
+// RFC 1951. The gzip and zlib packages implement access to DEFLATE-based file
+// formats.
+
+// +build generate
+
+//go:generate go run $GOFILE && gofmt -w inflate_gen.go
+
+package main
+
+import (
+ "os"
+ "strings"
+)
+
+func main() {
+ f, err := os.Create("inflate_gen.go")
+ if err != nil {
+ panic(err)
+ }
+ defer f.Close()
+ types := []string{"*bytes.Buffer", "*bytes.Reader", "*bufio.Reader", "*strings.Reader"}
+ names := []string{"BytesBuffer", "BytesReader", "BufioReader", "StringsReader"}
+ imports := []string{"bytes", "bufio", "io", "strings", "math/bits"}
+ f.WriteString(`// Code generated by go generate gen_inflate.go. DO NOT EDIT.
+
+package flate
+
+import (
+`)
+
+ for _, imp := range imports {
+ f.WriteString("\t\"" + imp + "\"\n")
+ }
+ f.WriteString(")\n\n")
+
+ template := `
+
+// $FUNCNAME$ decodes a single Huffman block from f.
+// f.r must be a $TYPE$.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) $FUNCNAME$() {
+ const (
+ stateInit = iota // Zero value must be stateInit
+ stateDict
+ )
+ fr := f.r.($TYPE$)
+ moreBits := func() error {
+ c, err := fr.ReadByte()
+ if err != nil {
+ return noEOF(err)
+ }
+ f.roffset++
+ f.b |= uint32(c) << f.nb
+ f.nb += 8
+ return nil
+ }
+
+ switch f.stepState {
+ case stateInit:
+ goto readLiteral
+ case stateDict:
+ goto copyHistory
+ }
+
+readLiteral:
+ // Read literal and/or (length, distance) according to RFC section 3.2.3.
+ {
+ var v int
+ {
+ // Inlined v, err := f.huffSym(f.hl)
+ // Since a huffmanDecoder can be empty or be composed of a degenerate tree
+ // with single element, huffSym must error on these two edge cases. In both
+ // cases, the chunks slice will be 0 for the invalid sequence, leading it
+ // satisfy the n == 0 check below.
+ n := uint(f.hl.maxRead)
+ // Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+ // but is smart enough to keep local variables in registers, so use nb and b,
+ // inline call to moreBits and reassign b,nb back to f on return.
+ nb, b := f.nb, f.b
+ for {
+ for nb < n {
+ c, err := fr.ReadByte()
+ if err != nil {
+ f.b = b
+ f.nb = nb
+ f.err = noEOF(err)
+ return
+ }
+ f.roffset++
+ b |= uint32(c) << (nb & 31)
+ nb += 8
+ }
+ chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+ n = uint(chunk & huffmanCountMask)
+ if n > huffmanChunkBits {
+ chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+ n = uint(chunk & huffmanCountMask)
+ }
+ if n <= nb {
+ if n == 0 {
+ f.b = b
+ f.nb = nb
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ f.b = b >> (n & 31)
+ f.nb = nb - n
+ v = int(chunk >> huffmanValueShift)
+ break
+ }
+ }
+ }
+
+ var n uint // number of bits extra
+ var length int
+ var err error
+ switch {
+ case v < 256:
+ f.dict.writeByte(byte(v))
+ if f.dict.availWrite() == 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).$FUNCNAME$
+ f.stepState = stateInit
+ return
+ }
+ goto readLiteral
+ case v == 256:
+ f.finishBlock()
+ return
+ // otherwise, reference to older data
+ case v < 265:
+ length = v - (257 - 3)
+ n = 0
+ case v < 269:
+ length = v*2 - (265*2 - 11)
+ n = 1
+ case v < 273:
+ length = v*4 - (269*4 - 19)
+ n = 2
+ case v < 277:
+ length = v*8 - (273*8 - 35)
+ n = 3
+ case v < 281:
+ length = v*16 - (277*16 - 67)
+ n = 4
+ case v < 285:
+ length = v*32 - (281*32 - 131)
+ n = 5
+ case v < maxNumLit:
+ length = 258
+ n = 0
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ if n > 0 {
+ for f.nb < n {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ length += int(f.b & uint32(1<<n-1))
+ f.b >>= n
+ f.nb -= n
+ }
+
+ var dist int
+ if f.hd == nil {
+ for f.nb < 5 {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+ f.b >>= 5
+ f.nb -= 5
+ } else {
+ if dist, err = f.huffSym(f.hd); err != nil {
+ f.err = err
+ return
+ }
+ }
+
+ switch {
+ case dist < 4:
+ dist++
+ case dist < maxNumDist:
+ nb := uint(dist-2) >> 1
+ // have 1 bit in bottom of dist, need nb more.
+ extra := (dist & 1) << nb
+ for f.nb < nb {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ extra |= int(f.b & uint32(1<<nb-1))
+ f.b >>= nb
+ f.nb -= nb
+ dist = 1<<(nb+1) + 1 + extra
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ // No check on length; encoding can be prescient.
+ if dist > f.dict.histSize() {
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ f.copyLen, f.copyDist = length, dist
+ goto copyHistory
+ }
+
+copyHistory:
+ // Perform a backwards copy according to RFC section 3.2.3.
+ {
+ cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+ if cnt == 0 {
+ cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+ }
+ f.copyLen -= cnt
+
+ if f.dict.availWrite() == 0 || f.copyLen > 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).$FUNCNAME$ // We need to continue this work
+ f.stepState = stateDict
+ return
+ }
+ goto readLiteral
+ }
+}
+
+`
+ for i, t := range types {
+ s := strings.Replace(template, "$FUNCNAME$", "huffman"+names[i], -1)
+ s = strings.Replace(s, "$TYPE$", t, -1)
+ f.WriteString(s)
+ }
+ f.WriteString("func (f *decompressor) huffmanBlockDecoder() func() {\n")
+ f.WriteString("\tswitch f.r.(type) {\n")
+ for i, t := range types {
+ f.WriteString("\t\tcase " + t + ":\n")
+ f.WriteString("\t\t\treturn f.huffman" + names[i] + "\n")
+ }
+ f.WriteString("\t\tdefault:\n")
+ f.WriteString("\t\t\treturn f.huffmanBlockGeneric")
+ f.WriteString("\t}\n}\n")
+}
diff --git a/src/compress/flate/inflate.go b/src/compress/flate/inflate.go
index 4992139..0145ee8 100644
--- a/src/compress/flate/inflate.go
+++ b/src/compress/flate/inflate.go
@@ -16,7 +16,8 @@
)

const (
- maxCodeLen = 16 // max length of Huffman code
+ maxCodeLen = 16 // max length of Huffman code
+ maxCodeLenMask = 15 // mask for max length of Huffman code
// The next three numbers come from the RFC section 3.2.7, with the
// additional proviso in section 3.2.5 which implies that distance codes
// 30 and 31 should never occur in compressed data.
@@ -102,10 +103,10 @@
)

type huffmanDecoder struct {
- min int // the minimum code length
- chunks [huffmanNumChunks]uint32 // chunks as described above
- links [][]uint32 // overflow links
- linkMask uint32 // mask the width of the link table
+ maxRead int // the maximum number of bits we can read and not overread
+ chunks *[huffmanNumChunks]uint16 // chunks as described above
+ links [][]uint16 // overflow links
+ linkMask uint32 // mask the width of the link table
}

// Initialize Huffman decoding tables from array of code lengths.
@@ -119,12 +120,15 @@
// development to supplement the currently ad-hoc unit tests.
const sanity = false

- if h.min != 0 {
- *h = huffmanDecoder{}
+ if h.chunks == nil {
+ h.chunks = &[huffmanNumChunks]uint16{}
+ }
+ if h.maxRead != 0 {
+ *h = huffmanDecoder{chunks: h.chunks, links: h.links}
}

// Count number of codes of each length,
- // compute min and max length.
+ // compute maxRead and max length.
var count [maxCodeLen]int
var min, max int
for _, n := range lengths {
@@ -137,7 +141,7 @@
if n > max {
max = n
}
- count[n]++
+ count[n&maxCodeLenMask]++
}

// Empty tree. The decompressor.huffSym function will fail later if the tree
@@ -155,8 +159,8 @@
var nextcode [maxCodeLen]int
for i := min; i <= max; i++ {
code <<= 1
- nextcode[i] = code
- code += count[i]
+ nextcode[i&maxCodeLenMask] = code
+ code += count[i&maxCodeLenMask]
}

// Check that the coding is complete (i.e., that we've
@@ -168,14 +172,23 @@
return false
}

- h.min = min
+ h.maxRead = min
+ chunks := h.chunks[:]
+ for i := range chunks {
+ chunks[i] = 0
+ }
+
if max > huffmanChunkBits {
numLinks := 1 << (uint(max) - huffmanChunkBits)
h.linkMask = uint32(numLinks - 1)

// create link tables
link := nextcode[huffmanChunkBits+1] >> 1
- h.links = make([][]uint32, huffmanNumChunks-link)
+ if cap(h.links) < huffmanNumChunks-link {
+ h.links = make([][]uint16, huffmanNumChunks-link)
+ } else {
+ h.links = h.links[:huffmanNumChunks-link]
+ }
for j := uint(link); j < huffmanNumChunks; j++ {
reverse := int(bits.Reverse16(uint16(j)))
reverse >>= uint(16 - huffmanChunkBits)
@@ -183,9 +196,16 @@
if sanity && h.chunks[reverse] != 0 {
panic("impossible: overwriting existing chunk")
}
- h.chunks[reverse] = uint32(off<<huffmanValueShift | (huffmanChunkBits + 1))
- h.links[off] = make([]uint32, numLinks)
+ h.chunks[reverse] = uint16(off<<huffmanValueShift | (huffmanChunkBits + 1))
+ if cap(h.links[off]) < numLinks {
+ h.links[off] = make([]uint16, numLinks)
+ } else {
+ links := h.links[off][:0]
+ h.links[off] = links[:numLinks]
+ }
}
+ } else {
+ h.links = h.links[:0]
}

for i, n := range lengths {
@@ -194,7 +214,7 @@
}
code := nextcode[n]
nextcode[n]++
- chunk := uint32(i<<huffmanValueShift | n)
+ chunk := uint16(i<<huffmanValueShift | n)
reverse := int(bits.Reverse16(uint16(code)))
reverse >>= uint(16 - n)
if n <= huffmanChunkBits {
@@ -269,10 +289,6 @@
r Reader
roffset int64

- // Input bits, in top of b.
- b uint32
- nb uint
-
// Huffman decoders for literal/length, distance.
h1, h2 huffmanDecoder

@@ -283,19 +299,24 @@
// Output history, buffer.
dict dictDecoder

- // Temporary buffer (avoids repeated allocation).
- buf [4]byte
-
// Next step in the decompression,
// and decompression state.
step func(*decompressor)
stepState int
- final bool
err error
toRead []byte
hl, hd *huffmanDecoder
copyLen int
copyDist int
+
+ // Temporary buffer (avoids repeated allocation).
+ buf [4]byte
+
+ // Input bits, in top of b.
+ b uint32
+
+ nb uint
+ final bool
}

func (f *decompressor) nextBlock() {
@@ -316,7 +337,7 @@
// compressed, fixed Huffman tables
f.hl = &fixedHuffmanDecoder
f.hd = nil
- f.huffmanBlock()
+ f.huffmanBlockDecoder()()
case 2:
// compressed, dynamic Huffman tables
if f.err = f.readHuffman(); f.err != nil {
@@ -324,7 +345,7 @@
}
f.hl = &f.h1
f.hd = &f.h2
- f.huffmanBlock()
+ f.huffmanBlockDecoder()()
default:
// 3 is reserved.
f.err = CorruptInputError(f.roffset)
@@ -460,12 +481,18 @@
return CorruptInputError(f.roffset)
}

- // As an optimization, we can initialize the min bits to read at a time
+ // As an optimization, we can initialize the maxRead bits to read at a time
// for the HLIT tree to the length of the EOB marker since we know that
// every block must terminate with one. This preserves the property that
// we never read any extra bytes after the end of the DEFLATE stream.
- if f.h1.min < f.bits[endBlockMarker] {
- f.h1.min = f.bits[endBlockMarker]
+ if f.h1.maxRead < f.bits[endBlockMarker] {
+ f.h1.maxRead = f.bits[endBlockMarker]
+ }
+ if !f.final {
+ // If not the final block, the smallest block possible is
+ // a predefined table, BTYPE=01, with a single EOB marker.
+ // This will take up 3 + 7 bits.
+ f.h1.maxRead += 10
}

return nil
@@ -475,7 +502,7 @@
// hl and hd are the Huffman states for the lit/length values
// and the distance values, respectively. If hd == nil, using the
// fixed distance encoding associated with fixed Huffman blocks.
-func (f *decompressor) huffmanBlock() {
+func (f *decompressor) huffmanBlockGeneric() {
const (
stateInit = iota // Zero value must be stateInit
stateDict
@@ -491,19 +518,61 @@
readLiteral:
// Read literal and/or (length, distance) according to RFC section 3.2.3.
{
- v, err := f.huffSym(f.hl)
- if err != nil {
- f.err = err
- return
+ var v int
+ {
+ // Inlined v, err := f.huffSym(f.hl)
+ // Since a huffmanDecoder can be empty or be composed of a degenerate tree
+ // with single element, huffSym must error on these two edge cases. In both
+ // cases, the chunks slice will be 0 for the invalid sequence, leading it
+ // satisfy the n == 0 check below.
+ n := uint(f.hl.maxRead)
+ // Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+ // but is smart enough to keep local variables in registers, so use nb and b,
+ // inline call to moreBits and reassign b,nb back to f on return.
+ nb, b := f.nb, f.b
+ for {
+ for nb < n {
+ c, err := f.r.ReadByte()
+ if err != nil {
+ f.b = b
+ f.nb = nb
+ f.err = noEOF(err)
+ return
+ }
+ f.roffset++
+ b |= uint32(c) << (nb & 31)
+ nb += 8
+ }
+ chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+ n = uint(chunk & huffmanCountMask)
+ if n > huffmanChunkBits {
+ chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+ n = uint(chunk & huffmanCountMask)
+ }
+ if n <= nb {
+ if n == 0 {
+ f.b = b
+ f.nb = nb
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ f.b = b >> (n & 31)
+ f.nb = nb - n
+ v = int(chunk >> huffmanValueShift)
+ break
+ }
+ }
}
+
var n uint // number of bits extra
var length int
+ var err error
switch {
case v < 256:
f.dict.writeByte(byte(v))
if f.dict.availWrite() == 0 {
f.toRead = f.dict.readFlush()
- f.step = (*decompressor).huffmanBlock
+ f.step = (*decompressor).huffmanBlockGeneric
f.stepState = stateInit
return
}
@@ -610,7 +679,7 @@

if f.dict.availWrite() == 0 || f.copyLen > 0 {
f.toRead = f.dict.readFlush()
- f.step = (*decompressor).huffmanBlock // We need to continue this work
+ f.step = (*decompressor).huffmanBlockGeneric // We need to continue this work
f.stepState = stateDict
return
}
@@ -621,20 +690,31 @@
// Copy a single uncompressed data block from input to output.
func (f *decompressor) dataBlock() {
// Uncompressed.
- // Discard current half-byte.
- f.nb = 0
- f.b = 0
+ // Discard current partial byte.
+ left := (f.nb) & 7
+ f.nb -= left
+ f.b >>= left
+
+ offBytes := f.nb >> 3
+ // Unfilled values will be overwritten.
+ f.buf[0] = uint8(f.b)
+ f.buf[1] = uint8(f.b >> 8)
+ f.buf[2] = uint8(f.b >> 16)
+ f.buf[3] = uint8(f.b >> 24)
+
+ f.roffset += int64(offBytes)
+ f.nb, f.b = 0, 0

// Length then ones-complement of length.
- nr, err := io.ReadFull(f.r, f.buf[0:4])
+ nr, err := io.ReadFull(f.r, f.buf[offBytes:4])
f.roffset += int64(nr)
if err != nil {
f.err = noEOF(err)
return
}
- n := int(f.buf[0]) | int(f.buf[1])<<8
- nn := int(f.buf[2]) | int(f.buf[3])<<8
- if uint16(nn) != uint16(^n) {
+ n := uint16(f.buf[0]) | uint16(f.buf[1])<<8
+ nn := uint16(f.buf[2]) | uint16(f.buf[3])<<8
+ if nn != ^n {
f.err = CorruptInputError(f.roffset)
return
}
@@ -645,7 +725,7 @@
return
}

- f.copyLen = n
+ f.copyLen = int(n)
f.copyData()
}

@@ -709,7 +789,7 @@
// with single element, huffSym must error on these two edge cases. In both
// cases, the chunks slice will be 0 for the invalid sequence, leading it
// satisfy the n == 0 check below.
- n := uint(h.min)
+ n := uint(h.maxRead)
// Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
// but is smart enough to keep local variables in registers, so use nb and b,
// inline call to moreBits and reassign b,nb back to f on return.
@@ -778,6 +858,8 @@
r: makeReader(r),
bits: f.bits,
codebits: f.codebits,
+ h1: f.h1,
+ h2: f.h2,
dict: f.dict,
step: (*decompressor).nextBlock,
}
diff --git a/src/compress/flate/inflate_gen.go b/src/compress/flate/inflate_gen.go
new file mode 100644
index 0000000..b7e6727
--- /dev/null
+++ b/src/compress/flate/inflate_gen.go
@@ -0,0 +1,829 @@
+// Code generated by go generate gen_inflate.go. DO NOT EDIT.
+
+package flate
+
+import (
+ "bufio"
+ "bytes"
+ "math/bits"
+ "strings"
+)
+
+// huffmanBytesBuffer decodes a single Huffman block from f.
+// f.r must be a *bytes.Buffer.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) huffmanBytesBuffer() {
+ const (
+ stateInit = iota // Zero value must be stateInit
+ stateDict
+ )
+ fr := f.r.(*bytes.Buffer)
+ moreBits := func() error {
+ c, err := fr.ReadByte()
+ if err != nil {
+ return noEOF(err)
+ }
+ f.roffset++
+ f.b |= uint32(c) << f.nb
+ f.nb += 8
+ return nil
+ }
+
+ switch f.stepState {
+ case stateInit:
+ goto readLiteral
+ case stateDict:
+ goto copyHistory
+ }
+
+readLiteral:
+ // Read literal and/or (length, distance) according to RFC section 3.2.3.
+ {
+ var v int
+ {
+ // Inlined v, err := f.huffSym(f.hl)
+ // Since a huffmanDecoder can be empty or be composed of a degenerate tree
+ // with single element, huffSym must error on these two edge cases. In both
+ // cases, the chunks slice will be 0 for the invalid sequence, leading it
+ // satisfy the n == 0 check below.
+ n := uint(f.hl.maxRead)
+ // Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+ // but is smart enough to keep local variables in registers, so use nb and b,
+ // inline call to moreBits and reassign b,nb back to f on return.
+ nb, b := f.nb, f.b
+ for {
+ for nb < n {
+ c, err := fr.ReadByte()
+ if err != nil {
+ f.b = b
+ f.nb = nb
+ f.err = noEOF(err)
+ return
+ }
+ f.roffset++
+ b |= uint32(c) << (nb & 31)
+ nb += 8
+ }
+ chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+ n = uint(chunk & huffmanCountMask)
+ if n > huffmanChunkBits {
+ chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+ n = uint(chunk & huffmanCountMask)
+ }
+ if n <= nb {
+ if n == 0 {
+ f.b = b
+ f.nb = nb
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ f.b = b >> (n & 31)
+ f.nb = nb - n
+ v = int(chunk >> huffmanValueShift)
+ break
+ }
+ }
+ }
+
+ var n uint // number of bits extra
+ var length int
+ var err error
+ switch {
+ case v < 256:
+ f.dict.writeByte(byte(v))
+ if f.dict.availWrite() == 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).huffmanBytesBuffer
+ f.stepState = stateInit
+ return
+ }
+ goto readLiteral
+ case v == 256:
+ f.finishBlock()
+ return
+ // otherwise, reference to older data
+ case v < 265:
+ length = v - (257 - 3)
+ n = 0
+ case v < 269:
+ length = v*2 - (265*2 - 11)
+ n = 1
+ case v < 273:
+ length = v*4 - (269*4 - 19)
+ n = 2
+ case v < 277:
+ length = v*8 - (273*8 - 35)
+ n = 3
+ case v < 281:
+ length = v*16 - (277*16 - 67)
+ n = 4
+ case v < 285:
+ length = v*32 - (281*32 - 131)
+ n = 5
+ case v < maxNumLit:
+ length = 258
+ n = 0
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ if n > 0 {
+ for f.nb < n {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ length += int(f.b & uint32(1<<n-1))
+ f.b >>= n
+ f.nb -= n
+ }
+
+ var dist int
+ if f.hd == nil {
+ for f.nb < 5 {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+ f.b >>= 5
+ f.nb -= 5
+ } else {
+ if dist, err = f.huffSym(f.hd); err != nil {
+ f.err = err
+ return
+ }
+ }
+
+ switch {
+ case dist < 4:
+ dist++
+ case dist < maxNumDist:
+ nb := uint(dist-2) >> 1
+ // have 1 bit in bottom of dist, need nb more.
+ extra := (dist & 1) << nb
+ for f.nb < nb {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ extra |= int(f.b & uint32(1<<nb-1))
+ f.b >>= nb
+ f.nb -= nb
+ dist = 1<<(nb+1) + 1 + extra
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ // No check on length; encoding can be prescient.
+ if dist > f.dict.histSize() {
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ f.copyLen, f.copyDist = length, dist
+ goto copyHistory
+ }
+
+copyHistory:
+ // Perform a backwards copy according to RFC section 3.2.3.
+ {
+ cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+ if cnt == 0 {
+ cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+ }
+ f.copyLen -= cnt
+
+ if f.dict.availWrite() == 0 || f.copyLen > 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).huffmanBytesBuffer // We need to continue this work
+ f.stepState = stateDict
+ return
+ }
+ goto readLiteral
+ }
+}
+
+// huffmanBytesReader decodes a single Huffman block from f.
+// f.r must be a *bytes.Reader.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) huffmanBytesReader() {
+ const (
+ stateInit = iota // Zero value must be stateInit
+ stateDict
+ )
+ fr := f.r.(*bytes.Reader)
+ moreBits := func() error {
+ c, err := fr.ReadByte()
+ if err != nil {
+ return noEOF(err)
+ }
+ f.roffset++
+ f.b |= uint32(c) << f.nb
+ f.nb += 8
+ return nil
+ }
+
+ switch f.stepState {
+ case stateInit:
+ goto readLiteral
+ case stateDict:
+ goto copyHistory
+ }
+
+readLiteral:
+ // Read literal and/or (length, distance) according to RFC section 3.2.3.
+ {
+ var v int
+ {
+ // Inlined v, err := f.huffSym(f.hl)
+ // Since a huffmanDecoder can be empty or be composed of a degenerate tree
+ // with single element, huffSym must error on these two edge cases. In both
+ // cases, the chunks slice will be 0 for the invalid sequence, leading it
+ // satisfy the n == 0 check below.
+ n := uint(f.hl.maxRead)
+ // Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+ // but is smart enough to keep local variables in registers, so use nb and b,
+ // inline call to moreBits and reassign b,nb back to f on return.
+ nb, b := f.nb, f.b
+ for {
+ for nb < n {
+ c, err := fr.ReadByte()
+ if err != nil {
+ f.b = b
+ f.nb = nb
+ f.err = noEOF(err)
+ return
+ }
+ f.roffset++
+ b |= uint32(c) << (nb & 31)
+ nb += 8
+ }
+ chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+ n = uint(chunk & huffmanCountMask)
+ if n > huffmanChunkBits {
+ chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+ n = uint(chunk & huffmanCountMask)
+ }
+ if n <= nb {
+ if n == 0 {
+ f.b = b
+ f.nb = nb
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ f.b = b >> (n & 31)
+ f.nb = nb - n
+ v = int(chunk >> huffmanValueShift)
+ break
+ }
+ }
+ }
+
+ var n uint // number of bits extra
+ var length int
+ var err error
+ switch {
+ case v < 256:
+ f.dict.writeByte(byte(v))
+ if f.dict.availWrite() == 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).huffmanBytesReader
+ f.stepState = stateInit
+ return
+ }
+ goto readLiteral
+ case v == 256:
+ f.finishBlock()
+ return
+ // otherwise, reference to older data
+ case v < 265:
+ length = v - (257 - 3)
+ n = 0
+ case v < 269:
+ length = v*2 - (265*2 - 11)
+ n = 1
+ case v < 273:
+ length = v*4 - (269*4 - 19)
+ n = 2
+ case v < 277:
+ length = v*8 - (273*8 - 35)
+ n = 3
+ case v < 281:
+ length = v*16 - (277*16 - 67)
+ n = 4
+ case v < 285:
+ length = v*32 - (281*32 - 131)
+ n = 5
+ case v < maxNumLit:
+ length = 258
+ n = 0
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ if n > 0 {
+ for f.nb < n {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ length += int(f.b & uint32(1<<n-1))
+ f.b >>= n
+ f.nb -= n
+ }
+
+ var dist int
+ if f.hd == nil {
+ for f.nb < 5 {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+ f.b >>= 5
+ f.nb -= 5
+ } else {
+ if dist, err = f.huffSym(f.hd); err != nil {
+ f.err = err
+ return
+ }
+ }
+
+ switch {
+ case dist < 4:
+ dist++
+ case dist < maxNumDist:
+ nb := uint(dist-2) >> 1
+ // have 1 bit in bottom of dist, need nb more.
+ extra := (dist & 1) << nb
+ for f.nb < nb {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ extra |= int(f.b & uint32(1<<nb-1))
+ f.b >>= nb
+ f.nb -= nb
+ dist = 1<<(nb+1) + 1 + extra
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ // No check on length; encoding can be prescient.
+ if dist > f.dict.histSize() {
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ f.copyLen, f.copyDist = length, dist
+ goto copyHistory
+ }
+
+copyHistory:
+ // Perform a backwards copy according to RFC section 3.2.3.
+ {
+ cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+ if cnt == 0 {
+ cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+ }
+ f.copyLen -= cnt
+
+ if f.dict.availWrite() == 0 || f.copyLen > 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).huffmanBytesReader // We need to continue this work
+ f.stepState = stateDict
+ return
+ }
+ goto readLiteral
+ }
+}
+
+// huffmanBufioReader decodes a single Huffman block from f.
+// f.r must be a *bufio.Reader.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) huffmanBufioReader() {
+ const (
+ stateInit = iota // Zero value must be stateInit
+ stateDict
+ )
+ fr := f.r.(*bufio.Reader)
+ moreBits := func() error {
+ c, err := fr.ReadByte()
+ if err != nil {
+ return noEOF(err)
+ }
+ f.roffset++
+ f.b |= uint32(c) << f.nb
+ f.nb += 8
+ return nil
+ }
+
+ switch f.stepState {
+ case stateInit:
+ goto readLiteral
+ case stateDict:
+ goto copyHistory
+ }
+
+readLiteral:
+ // Read literal and/or (length, distance) according to RFC section 3.2.3.
+ {
+ var v int
+ {
+ // Inlined v, err := f.huffSym(f.hl)
+ // Since a huffmanDecoder can be empty or be composed of a degenerate tree
+ // with single element, huffSym must error on these two edge cases. In both
+ // cases, the chunks slice will be 0 for the invalid sequence, leading it
+ // satisfy the n == 0 check below.
+ n := uint(f.hl.maxRead)
+ // Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+ // but is smart enough to keep local variables in registers, so use nb and b,
+ // inline call to moreBits and reassign b,nb back to f on return.
+ nb, b := f.nb, f.b
+ for {
+ for nb < n {
+ c, err := fr.ReadByte()
+ if err != nil {
+ f.b = b
+ f.nb = nb
+ f.err = noEOF(err)
+ return
+ }
+ f.roffset++
+ b |= uint32(c) << (nb & 31)
+ nb += 8
+ }
+ chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+ n = uint(chunk & huffmanCountMask)
+ if n > huffmanChunkBits {
+ chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+ n = uint(chunk & huffmanCountMask)
+ }
+ if n <= nb {
+ if n == 0 {
+ f.b = b
+ f.nb = nb
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ f.b = b >> (n & 31)
+ f.nb = nb - n
+ v = int(chunk >> huffmanValueShift)
+ break
+ }
+ }
+ }
+
+ var n uint // number of bits extra
+ var length int
+ var err error
+ switch {
+ case v < 256:
+ f.dict.writeByte(byte(v))
+ if f.dict.availWrite() == 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).huffmanBufioReader
+ f.stepState = stateInit
+ return
+ }
+ goto readLiteral
+ case v == 256:
+ f.finishBlock()
+ return
+ // otherwise, reference to older data
+ case v < 265:
+ length = v - (257 - 3)
+ n = 0
+ case v < 269:
+ length = v*2 - (265*2 - 11)
+ n = 1
+ case v < 273:
+ length = v*4 - (269*4 - 19)
+ n = 2
+ case v < 277:
+ length = v*8 - (273*8 - 35)
+ n = 3
+ case v < 281:
+ length = v*16 - (277*16 - 67)
+ n = 4
+ case v < 285:
+ length = v*32 - (281*32 - 131)
+ n = 5
+ case v < maxNumLit:
+ length = 258
+ n = 0
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ if n > 0 {
+ for f.nb < n {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ length += int(f.b & uint32(1<<n-1))
+ f.b >>= n
+ f.nb -= n
+ }
+
+ var dist int
+ if f.hd == nil {
+ for f.nb < 5 {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+ f.b >>= 5
+ f.nb -= 5
+ } else {
+ if dist, err = f.huffSym(f.hd); err != nil {
+ f.err = err
+ return
+ }
+ }
+
+ switch {
+ case dist < 4:
+ dist++
+ case dist < maxNumDist:
+ nb := uint(dist-2) >> 1
+ // have 1 bit in bottom of dist, need nb more.
+ extra := (dist & 1) << nb
+ for f.nb < nb {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ extra |= int(f.b & uint32(1<<nb-1))
+ f.b >>= nb
+ f.nb -= nb
+ dist = 1<<(nb+1) + 1 + extra
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ // No check on length; encoding can be prescient.
+ if dist > f.dict.histSize() {
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ f.copyLen, f.copyDist = length, dist
+ goto copyHistory
+ }
+
+copyHistory:
+ // Perform a backwards copy according to RFC section 3.2.3.
+ {
+ cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+ if cnt == 0 {
+ cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+ }
+ f.copyLen -= cnt
+
+ if f.dict.availWrite() == 0 || f.copyLen > 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).huffmanBufioReader // We need to continue this work
+ f.stepState = stateDict
+ return
+ }
+ goto readLiteral
+ }
+}
+
+// huffmanStringsReader decodes a single Huffman block from f.
+// f.r must be a *strings.Reader.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) huffmanStringsReader() {
+ const (
+ stateInit = iota // Zero value must be stateInit
+ stateDict
+ )
+ fr := f.r.(*strings.Reader)
+ moreBits := func() error {
+ c, err := fr.ReadByte()
+ if err != nil {
+ return noEOF(err)
+ }
+ f.roffset++
+ f.b |= uint32(c) << f.nb
+ f.nb += 8
+ return nil
+ }
+
+ switch f.stepState {
+ case stateInit:
+ goto readLiteral
+ case stateDict:
+ goto copyHistory
+ }
+
+readLiteral:
+ // Read literal and/or (length, distance) according to RFC section 3.2.3.
+ {
+ var v int
+ {
+ // Inlined v, err := f.huffSym(f.hl)
+ // Since a huffmanDecoder can be empty or be composed of a degenerate tree
+ // with single element, huffSym must error on these two edge cases. In both
+ // cases, the chunks slice will be 0 for the invalid sequence, leading it
+ // satisfy the n == 0 check below.
+ n := uint(f.hl.maxRead)
+ // Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+ // but is smart enough to keep local variables in registers, so use nb and b,
+ // inline call to moreBits and reassign b,nb back to f on return.
+ nb, b := f.nb, f.b
+ for {
+ for nb < n {
+ c, err := fr.ReadByte()
+ if err != nil {
+ f.b = b
+ f.nb = nb
+ f.err = noEOF(err)
+ return
+ }
+ f.roffset++
+ b |= uint32(c) << (nb & 31)
+ nb += 8
+ }
+ chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+ n = uint(chunk & huffmanCountMask)
+ if n > huffmanChunkBits {
+ chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+ n = uint(chunk & huffmanCountMask)
+ }
+ if n <= nb {
+ if n == 0 {
+ f.b = b
+ f.nb = nb
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ f.b = b >> (n & 31)
+ f.nb = nb - n
+ v = int(chunk >> huffmanValueShift)
+ break
+ }
+ }
+ }
+
+ var n uint // number of bits extra
+ var length int
+ var err error
+ switch {
+ case v < 256:
+ f.dict.writeByte(byte(v))
+ if f.dict.availWrite() == 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).huffmanStringsReader
+ f.stepState = stateInit
+ return
+ }
+ goto readLiteral
+ case v == 256:
+ f.finishBlock()
+ return
+ // otherwise, reference to older data
+ case v < 265:
+ length = v - (257 - 3)
+ n = 0
+ case v < 269:
+ length = v*2 - (265*2 - 11)
+ n = 1
+ case v < 273:
+ length = v*4 - (269*4 - 19)
+ n = 2
+ case v < 277:
+ length = v*8 - (273*8 - 35)
+ n = 3
+ case v < 281:
+ length = v*16 - (277*16 - 67)
+ n = 4
+ case v < 285:
+ length = v*32 - (281*32 - 131)
+ n = 5
+ case v < maxNumLit:
+ length = 258
+ n = 0
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+ if n > 0 {
+ for f.nb < n {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ length += int(f.b & uint32(1<<n-1))
+ f.b >>= n
+ f.nb -= n
+ }
+
+ var dist int
+ if f.hd == nil {
+ for f.nb < 5 {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+ f.b >>= 5
+ f.nb -= 5
+ } else {
+ if dist, err = f.huffSym(f.hd); err != nil {
+ f.err = err
+ return
+ }
+ }
+
+ switch {
+ case dist < 4:
+ dist++
+ case dist < maxNumDist:
+ nb := uint(dist-2) >> 1
+ // have 1 bit in bottom of dist, need nb more.
+ extra := (dist & 1) << nb
+ for f.nb < nb {
+ if err = moreBits(); err != nil {
+ f.err = err
+ return
+ }
+ }
+ extra |= int(f.b & uint32(1<<nb-1))
+ f.b >>= nb
+ f.nb -= nb
+ dist = 1<<(nb+1) + 1 + extra
+ default:
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ // No check on length; encoding can be prescient.
+ if dist > f.dict.histSize() {
+ f.err = CorruptInputError(f.roffset)
+ return
+ }
+
+ f.copyLen, f.copyDist = length, dist
+ goto copyHistory
+ }
+
+copyHistory:
+ // Perform a backwards copy according to RFC section 3.2.3.
+ {
+ cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+ if cnt == 0 {
+ cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+ }
+ f.copyLen -= cnt
+
+ if f.dict.availWrite() == 0 || f.copyLen > 0 {
+ f.toRead = f.dict.readFlush()
+ f.step = (*decompressor).huffmanStringsReader // We need to continue this work
+ f.stepState = stateDict
+ return
+ }
+ goto readLiteral
+ }
+}
+
+func (f *decompressor) huffmanBlockDecoder() func() {
+ switch f.r.(type) {
+ case *bytes.Buffer:
+ return f.huffmanBytesBuffer
+ case *bytes.Reader:
+ return f.huffmanBytesReader
+ case *bufio.Reader:
+ return f.huffmanBufioReader
+ case *strings.Reader:
+ return f.huffmanStringsReader
+ default:
+ return f.huffmanBlockGeneric
+ }
+}
diff --git a/src/compress/flate/reader_test.go b/src/compress/flate/reader_test.go
index 9d2943a..cac99a8 100644
--- a/src/compress/flate/reader_test.go
+++ b/src/compress/flate/reader_test.go
@@ -51,10 +51,14 @@
w.Close()
buf1 := compressed.Bytes()
buf0, compressed, w = nil, nil, nil
+ src := bytes.NewReader(buf1)
+ dec := NewReader(src)
runtime.GC()
b.StartTimer()
for i := 0; i < b.N; i++ {
- io.Copy(ioutil.Discard, NewReader(bytes.NewReader(buf1)))
+ src.Reset(buf1)
+ dec.(Resetter).Reset(src, nil)
+ io.Copy(ioutil.Discard, dec)
}
})
}

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 1
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Klaus Post <klau...@gmail.com>
Gerrit-MessageType: newchange

Gerrit Bot (Gerrit)

unread,
Apr 9, 2020, 7:11:10 AM4/9/20
to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Gerrit Bot uploaded patch set #2 to this change.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 2
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Klaus Post <klau...@gmail.com>
Gerrit-MessageType: newpatchset

Gopher Robot (Gerrit)

unread,
Dec 15, 2021, 2:25:24 PM12/15/21
to Gerrit Dou, Klaus Post, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Gopher Robot abandoned this change.

View Change

Abandoned GitHub PR golang/go#38324 has been closed.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 2
Gerrit-Owner: Gerrit Dou <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Klaus Post <klau...@gmail.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-MessageType: abandon

Heschi Kreinick (Gerrit)

unread,
Dec 16, 2021, 1:02:05 PM12/16/21
to Gerrit Dou, Klaus Post, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Heschi Kreinick restored this change.

View Change

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 2
Gerrit-Owner: Gerrit Dou <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Klaus Post <klau...@gmail.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-MessageType: restore

Ian Lance Taylor (Gerrit)

unread,
May 6, 2022, 8:02:54 PM5/6/22
to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

View Change

1 comment:

  • Patchset:

    • Patch Set #2:

      Sorry this has sat around for so long. Any interest in fixing the merge conflicts? Thanks.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 2
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Klaus Post <klau...@gmail.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-Comment-Date: Sat, 07 May 2022 00:02:50 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Klaus Post (Gerrit)

unread,
May 7, 2022, 3:54:30 AM5/7/22
to Gerrit Bot, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

View Change

1 comment:

  • Patchset:

    • Patch Set #2:

      Sorry this has sat around for so long. Any interest in fixing the merge conflicts? Thanks.

    • Sure. I have an additional 5-10% I found since then, which I will see if I can incorporate.

      Maybe this could be achieved through generics instead of codegen. I will investigate a bit, since I haven't done anything with generics yet.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 2
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Klaus Post <klau...@gmail.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-Comment-Date: Sat, 07 May 2022 07:54:25 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Ian Lance Taylor <ia...@golang.org>
Gerrit-MessageType: comment

Gerrit Bot (Gerrit)

unread,
Jun 7, 2022, 7:23:07 AM6/7/22
to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

Gerrit Bot uploaded patch set #3 to this change.

View Change

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: c00babd617fabf290f784c5bd074a82a6af781c5

GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go
4 files changed, 1,386 insertions(+), 46 deletions(-)

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 3
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-MessageType: newpatchset

Gerrit Bot (Gerrit)

unread,
Jun 7, 2022, 8:12:08 AM6/7/22
to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

Gerrit Bot uploaded patch set #4 to this change.

View Change

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: ae9b62a9851eb3d8acf14cebfef1d8c663e407dc

GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go
4 files changed, 1,665 insertions(+), 186 deletions(-)

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 4
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>

Gerrit Bot (Gerrit)

unread,
Jun 7, 2022, 9:10:34 AM6/7/22
to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

Gerrit Bot uploaded patch set #5 to this change.

View Change

Decode/Digits/Huffman/1e4-32        63.8µs ± 0%    41.3µs ± 0%   -35.22%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32 625µs ± 0% 404µs ± 0% -35.31% (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32 6.25ms ± 0% 4.02ms ± 1% -35.64% (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32 72.1µs ± 1% 48.0µs ± 0% -33.36% (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32 792µs ± 1% 578µs ± 1% -27.04% (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32 8.09ms ± 0% 5.85ms ± 0% -27.68% (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32 74.1µs ± 1% 49.7µs ± 1% -32.87% (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32 775µs ± 1% 579µs ± 0% -25.35% (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32 7.84ms ± 1% 5.84ms ± 1% -25.59% (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32 74.1µs ± 0% 49.8µs ± 0% -32.83% (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32 777µs ± 1% 579µs ± 0% -25.47% (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32 7.83ms ± 1% 5.83ms ± 0% -25.59% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32 72.9µs ± 0% 45.6µs ± 1% -37.48% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32 712µs ± 1% 471µs ± 1% -33.92% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32 7.11ms ± 0% 4.70ms ± 1% -33.98% (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32 67.0µs ± 1% 45.4µs ± 1% -32.19% (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32 616µs ± 1% 447µs ± 0% -27.49% (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32 6.17ms ± 0% 4.50ms ± 0% -26.98% (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32 60.7µs ± 0% 39.6µs ± 0% -34.84% (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32 492µs ± 0% 360µs ± 0% -26.84% (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32 4.87ms ± 1% 3.59ms ± 0% -26.34% (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32 60.8µs ± 1% 39.6µs ± 1% -34.92% (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32 491µs ± 1% 357µs ± 1% -27.23% (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32 4.84ms ± 0% 3.58ms ± 0% -26.17% (p=0.000 n=10+10)


name old speed new speed delta
Decode/Digits/Huffman/1e4-32       157MB/s ± 0%   242MB/s ± 0%   +54.37%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32 160MB/s ± 0% 247MB/s ± 0% +54.58% (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32 160MB/s ± 0% 249MB/s ± 1% +55.39% (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32 139MB/s ± 1% 208MB/s ± 0% +50.06% (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32 126MB/s ± 1% 173MB/s ± 1% +37.05% (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32 124MB/s ± 0% 171MB/s ± 0% +38.28% (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32 135MB/s ± 1% 201MB/s ± 1% +48.95% (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32 129MB/s ± 1% 173MB/s ± 0% +33.95% (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32 127MB/s ± 1% 171MB/s ± 1% +34.39% (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32 135MB/s ± 0% 201MB/s ± 0% +48.88% (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32 129MB/s ± 1% 173MB/s ± 0% +34.17% (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32 128MB/s ± 1% 172MB/s ± 0% +34.39% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32 137MB/s ± 0% 219MB/s ± 1% +59.96% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32 140MB/s ± 1% 212MB/s ± 1% +51.32% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32 141MB/s ± 0% 213MB/s ± 1% +51.46% (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32 149MB/s ± 1% 220MB/s ± 1% +47.48% (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32 162MB/s ± 1% 224MB/s ± 0% +37.92% (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32 162MB/s ± 0% 222MB/s ± 0% +36.95% (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32 165MB/s ± 0% 253MB/s ± 0% +53.47% (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32 203MB/s ± 0% 278MB/s ± 0% +36.68% (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32 205MB/s ± 1% 279MB/s ± 0% +35.77% (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32 164MB/s ± 1% 253MB/s ± 1% +53.66% (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32 204MB/s ± 1% 280MB/s ± 1% +37.41% (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32 206MB/s ± 0% 280MB/s ± 0% +35.44% (p=0.000 n=10+10)


name old alloc/op new alloc/op delta
Decode/Digits/Huffman/1e4-32        0.00B ±NaN%    16.00B ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32 6.00B ± 0% 36.00B ± 0% +500.00% (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32 64.0B ± 0% 304.0B ± 0% +375.00% (p=0.000 n=10+9)
Decode/Digits/Speed/1e4-32 80.0B ± 0% 16.0B ± 0% -80.00% (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32 296B ± 0% 39B ± 0% -86.82% (p=0.000 n=10+8)
Decode/Digits/Speed/1e6-32 3.78kB ± 0% 0.33kB ± 0% -91.29% (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32 40.0B ± 0% 16.0B ± 0% -60.00% (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32 287B ± 0% 54B ± 1% -81.04% (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32 4.15kB ± 0% 0.44kB ± 0% -89.43% (p=0.000 n=10+9)
Decode/Digits/Compression/1e4-32 40.0B ± 0% 16.0B ± 0% -60.00% (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32 288B ± 0% 55B ± 1% -81.01% (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32 4.15kB ± 0% 0.44kB ± 0% -89.43% (p=0.000 n=10+8)
Decode/Newton/Huffman/1e4-32 705B ± 0% 16B ± 0% -97.73% (p=0.000 n=9+10)
Decode/Newton/Huffman/1e5-32 4.49kB ± 0% 0.04kB ± 0% -99.15% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32 39.4kB ± 0% 0.3kB ± 0% -99.18% (p=0.000 n=9+10)
Decode/Newton/Speed/1e4-32 617B ± 0% 16B ± 0% -97.41% (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32 3.19kB ± 0% 0.04kB ± 0% -98.84% (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32 40.5kB ± 0% 0.3kB ± 0% -99.15% (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32 513B ± 0% 16B ± 0% -96.88% (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32 2.35kB ± 0% 0.04kB ± 0% -98.47% (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32 21.1kB ± 0% 0.3kB ± 0% -98.80% (p=0.000 n=8+8)
Decode/Newton/Compression/1e4-32 513B ± 0% 16B ± 0% -96.88% (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32 2.35kB ± 0% 0.04kB ± 0% -98.47% (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32 22.9kB ± 0% 0.2kB ± 0% -98.92% (p=0.000 n=8+8)


name old allocs/op new allocs/op delta
Decode/Digits/Huffman/1e4-32         0.00 ±NaN%      1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32 0.00 ±NaN% 2.00 ± 0% +Inf% (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32 0.00 ±NaN% 16.00 ± 0% +Inf% (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32 6.00 ± 0% 2.00 ± 0% -66.67% (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32 68.0 ± 0% 16.0 ± 0% -76.47% (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32 8.00 ± 0% 3.00 ± 0% -62.50% (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32 74.0 ± 0% 23.0 ± 0% -68.92% (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32 8.00 ± 0% 3.00 ± 0% -62.50% (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32 74.0 ± 0% 23.0 ± 0% -68.92% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32 9.00 ± 0% 1.00 ± 0% -88.89% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32 18.0 ± 0% 2.0 ± 0% -88.89% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32 156 ± 0% 16 ± 0% -89.74% (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32 13.0 ± 0% 1.0 ± 0% -92.31% (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32 26.0 ± 0% 2.0 ± 0% -92.31% (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32 223 ± 0% 16 ± 0% -92.83% (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32 10.0 ± 0% 1.0 ± 0% -90.00% (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32 27.0 ± 0% 2.0 ± 0% -92.59% (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32 153 ± 0% 12 ± 0% -92.16% (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32 10.0 ± 0% 1.0 ± 0% -90.00% (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32 27.0 ± 0% 2.0 ± 0% -92.59% (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32 145 ± 0% 12 ± 0% -91.72% (p=0.000 n=10+10)

```

These changes have been included in github.com/klauspost/compress
for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: ae9b62a9851eb3d8acf14cebfef1d8c663e407dc

GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go
4 files changed, 1,665 insertions(+), 186 deletions(-)

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 5
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>

Klaus Post (Gerrit)

unread,
Jun 7, 2022, 9:15:48 AM6/7/22
to Gerrit Bot, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

View Change

2 comments:

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 5
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-Comment-Date: Tue, 07 Jun 2022 13:15:42 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Ian Lance Taylor <ia...@golang.org>
Comment-In-Reply-To: Klaus Post <klau...@gmail.com>
Gerrit-MessageType: comment

Gerrit Bot (Gerrit)

unread,
Jun 10, 2022, 7:09:48 AM6/10/22
to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

Gerrit Bot uploaded patch set #6 to this change.

View Change

Decode/Digits/Huffman/1e4-32        63.8µs ± 0%    41.3µs ± 0%   -35.22%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32 625µs ± 0% 404µs ± 0% -35.31% (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32 6.25ms ± 0% 4.02ms ± 1% -35.64% (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32 72.1µs ± 1% 48.0µs ± 0% -33.36% (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32 792µs ± 1% 578µs ± 1% -27.04% (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32 8.09ms ± 0% 5.85ms ± 0% -27.68% (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32 74.1µs ± 1% 49.7µs ± 1% -32.87% (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32 775µs ± 1% 579µs ± 0% -25.35% (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32 7.84ms ± 1% 5.84ms ± 1% -25.59% (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32 74.1µs ± 0% 49.8µs ± 0% -32.83% (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32 777µs ± 1% 579µs ± 0% -25.47% (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32 7.83ms ± 1% 5.83ms ± 0% -25.59% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32 72.9µs ± 0% 45.6µs ± 1% -37.48% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32 712µs ± 1% 471µs ± 1% -33.92% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32 7.11ms ± 0% 4.70ms ± 1% -33.98% (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32 67.0µs ± 1% 45.4µs ± 1% -32.19% (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32 616µs ± 1% 447µs ± 0% -27.49% (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32 6.17ms ± 0% 4.50ms ± 0% -26.98% (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32 60.7µs ± 0% 39.6µs ± 0% -34.84% (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32 492µs ± 0% 360µs ± 0% -26.84% (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32 4.87ms ± 1% 3.59ms ± 0% -26.34% (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32 60.8µs ± 1% 39.6µs ± 1% -34.92% (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32 491µs ± 1% 357µs ± 1% -27.23% (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32 4.84ms ± 0% 3.58ms ± 0% -26.17% (p=0.000 n=10+10)

name                              old speed      new speed      delta
Decode/Digits/Huffman/1e4-32       157MB/s ± 0%   242MB/s ± 0%   +54.37%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32 160MB/s ± 0% 247MB/s ± 0% +54.58% (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32 160MB/s ± 0% 249MB/s ± 1% +55.39% (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32 139MB/s ± 1% 208MB/s ± 0% +50.06% (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32 126MB/s ± 1% 173MB/s ± 1% +37.05% (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32 124MB/s ± 0% 171MB/s ± 0% +38.28% (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32 135MB/s ± 1% 201MB/s ± 1% +48.95% (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32 129MB/s ± 1% 173MB/s ± 0% +33.95% (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32 127MB/s ± 1% 171MB/s ± 1% +34.39% (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32 135MB/s ± 0% 201MB/s ± 0% +48.88% (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32 129MB/s ± 1% 173MB/s ± 0% +34.17% (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32 128MB/s ± 1% 172MB/s ± 0% +34.39% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32 137MB/s ± 0% 219MB/s ± 1% +59.96% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32 140MB/s ± 1% 212MB/s ± 1% +51.32% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32 141MB/s ± 0% 213MB/s ± 1% +51.46% (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32 149MB/s ± 1% 220MB/s ± 1% +47.48% (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32 162MB/s ± 1% 224MB/s ± 0% +37.92% (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32 162MB/s ± 0% 222MB/s ± 0% +36.95% (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32 165MB/s ± 0% 253MB/s ± 0% +53.47% (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32 203MB/s ± 0% 278MB/s ± 0% +36.68% (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32 205MB/s ± 1% 279MB/s ± 0% +35.77% (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32 164MB/s ± 1% 253MB/s ± 1% +53.66% (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32 204MB/s ± 1% 280MB/s ± 1% +37.41% (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32 206MB/s ± 0% 280MB/s ± 0% +35.44% (p=0.000 n=10+10)

name                              old alloc/op   new alloc/op   delta
Decode/Digits/Huffman/1e4-32        0.00B ±NaN%    16.00B ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32 6.00B ± 0% 36.00B ± 0% +500.00% (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32 64.0B ± 0% 304.0B ± 0% +375.00% (p=0.000 n=10+9)
Decode/Digits/Speed/1e4-32 80.0B ± 0% 16.0B ± 0% -80.00% (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32 296B ± 0% 39B ± 0% -86.82% (p=0.000 n=10+8)
Decode/Digits/Speed/1e6-32 3.78kB ± 0% 0.33kB ± 0% -91.29% (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32 40.0B ± 0% 16.0B ± 0% -60.00% (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32 287B ± 0% 54B ± 1% -81.04% (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32 4.15kB ± 0% 0.44kB ± 0% -89.43% (p=0.000 n=10+9)
Decode/Digits/Compression/1e4-32 40.0B ± 0% 16.0B ± 0% -60.00% (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32 288B ± 0% 55B ± 1% -81.01% (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32 4.15kB ± 0% 0.44kB ± 0% -89.43% (p=0.000 n=10+8)
Decode/Newton/Huffman/1e4-32 705B ± 0% 16B ± 0% -97.73% (p=0.000 n=9+10)
Decode/Newton/Huffman/1e5-32 4.49kB ± 0% 0.04kB ± 0% -99.15% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32 39.4kB ± 0% 0.3kB ± 0% -99.18% (p=0.000 n=9+10)
Decode/Newton/Speed/1e4-32 617B ± 0% 16B ± 0% -97.41% (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32 3.19kB ± 0% 0.04kB ± 0% -98.84% (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32 40.5kB ± 0% 0.3kB ± 0% -99.15% (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32 513B ± 0% 16B ± 0% -96.88% (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32 2.35kB ± 0% 0.04kB ± 0% -98.47% (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32 21.1kB ± 0% 0.3kB ± 0% -98.80% (p=0.000 n=8+8)
Decode/Newton/Compression/1e4-32 513B ± 0% 16B ± 0% -96.88% (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32 2.35kB ± 0% 0.04kB ± 0% -98.47% (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32 22.9kB ± 0% 0.2kB ± 0% -98.92% (p=0.000 n=8+8)

name                              old allocs/op  new allocs/op  delta
Decode/Digits/Huffman/1e4-32         0.00 ±NaN%      1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32 0.00 ±NaN% 2.00 ± 0% +Inf% (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32 0.00 ±NaN% 16.00 ± 0% +Inf% (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32 6.00 ± 0% 2.00 ± 0% -66.67% (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32 68.0 ± 0% 16.0 ± 0% -76.47% (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32 8.00 ± 0% 3.00 ± 0% -62.50% (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32 74.0 ± 0% 23.0 ± 0% -68.92% (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32 8.00 ± 0% 3.00 ± 0% -62.50% (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32 74.0 ± 0% 23.0 ± 0% -68.92% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32 9.00 ± 0% 1.00 ± 0% -88.89% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32 18.0 ± 0% 2.0 ± 0% -88.89% (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32 156 ± 0% 16 ± 0% -89.74% (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32 13.0 ± 0% 1.0 ± 0% -92.31% (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32 26.0 ± 0% 2.0 ± 0% -92.31% (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32 223 ± 0% 16 ± 0% -92.83% (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32 10.0 ± 0% 1.0 ± 0% -90.00% (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32 27.0 ± 0% 2.0 ± 0% -92.59% (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32 153 ± 0% 12 ± 0% -92.16% (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32 10.0 ± 0% 1.0 ± 0% -90.00% (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32 27.0 ± 0% 2.0 ± 0% -92.59% (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32 145 ± 0% 12 ± 0% -91.72% (p=0.000 n=10+10)

```

These changes have been included in github.com/klauspost/compress
for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: 161f02171c05a37751c7ca314daf5e5b4809905f

GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go
4 files changed, 1,653 insertions(+), 186 deletions(-)

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 6
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-MessageType: newpatchset

Ian Lance Taylor (Gerrit)

unread,
Sep 8, 2022, 8:22:54 PM9/8/22
to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

View Change

1 comment:

  • Patchset:

    • Patch Set #6:

      Ping Joe Tsai: do you have any time to look at this one? Thanks.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 6
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Comment-Date: Fri, 09 Sep 2022 00:22:49 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Ian Lance Taylor (Gerrit)

unread,
Nov 8, 2022, 4:29:51 PM11/8/22
to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Joe Tsai, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

View Change

1 comment:

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 6
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Joe Tsai <thebroke...@gmail.com>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Comment-Date: Tue, 08 Nov 2022 21:29:45 +0000

Joseph Tsai (Gerrit)

unread,
Nov 8, 2022, 4:49:42 PM11/8/22
to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Joe Tsai, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

View Change

1 comment:

  • Patchset:

    • Patch Set #6:

      This is a lot of code to work around the fact that the API guarantees that we never read a single byte past the final block IF the input reader happens to implement `io.ByteReader`.

      Most usages do not actually pass an `io.ByteReader`, and so the implementation implicitly wraps it within a `bufio.Reader`. There seems to be an impedance mismatch in the complexity of this code and what happens in practice.

      I argue that we should instead specialize for `bufio.Reader` and make use of the `bufio.Reader.{Buffered,Peek,Discard}` methods to fetch a chunk of the upcoming data. This avoids the tricky logic checking for the final block and computing the maximum number of bits we are allowed to read. This approach is what's taken in https://github.com/dsnet/compress/blob/f66993602bf5da07ef49d35b08e7264ae9fe2b6e/internal/prefix/reader.go#L253-L317 and results in performance gains comparable to what's reported here for much less code complexity.

      We already have a dependency on `bufio.Reader`, so special-casing that type seem palatable. Anyone wanting better performance, can just wrap their input in a `bufio.Reader`.

      Adding a dependency on "bytes" and "strings" is expensive and shouldn't be done lightly. See https://github.com/golang/go/issues/54098. Until that issue is resolved, we shouldn't be adding dependencies on these if it's not necessary.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 6
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Joe Tsai <thebroke...@gmail.com>
Gerrit-CC: Joseph Tsai <joe...@digital-static.net>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Comment-Date: Tue, 08 Nov 2022 21:49:36 +0000

Joseph Tsai (Gerrit)

unread,
Nov 8, 2022, 4:50:37 PM11/8/22
to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Joe Tsai, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

View Change

1 comment:

  • Patchset:

    • Patch Set #6:

      Ping Joe Tsai: do you have any time to look at this one? Thanks.

      It seems that I never got the email since I'm no longer reachable at joe...@google.com :|

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 6
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Joe Tsai <thebroke...@gmail.com>
Gerrit-CC: Joseph Tsai <joe...@digital-static.net>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-Comment-Date: Tue, 08 Nov 2022 21:50:31 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Ian Lance Taylor <ia...@golang.org>
Gerrit-MessageType: comment

Joseph Tsai (Gerrit)

unread,
Nov 8, 2022, 4:54:59 PM11/8/22
to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Joe Tsai, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

View Change

1 comment:

  • Patchset:

    • Patch Set #6:

      This is a lot of code to work around the fact that the API guarantees that we never read a single by […]

      It seems that "bufio" already depends on "bytes" and "strings", so my last point isn't valid, but the rest of my argument still stands.

      If we want to optimize `bytes.Buffer`, `strings.Reader`, and `bytes.Reader` with my proposed approach, we can wrap it in a way that presents an API similar to `bufio.Reader`: https://github.com/dsnet/compress/blob/master/internal/prefix/wrap.go

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 6
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Joe Tsai <thebroke...@gmail.com>
Gerrit-CC: Joseph Tsai <joe...@digital-static.net>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-Comment-Date: Tue, 08 Nov 2022 21:54:53 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Joseph Tsai <joe...@digital-static.net>
Gerrit-MessageType: comment

Gerrit Bot (Gerrit)

unread,
Oct 23, 2025, 6:41:04 AMOct 23
to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com
Attention needed from Ian Lance Taylor, Joe Tsai and Matthew Dempsky

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #7 to this change.
Open in Gerrit

Related details

Attention is currently required from:
  • Ian Lance Taylor
  • Joe Tsai
  • Matthew Dempsky
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: newpatchset
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 7
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Joe Tsai <thebroke...@gmail.com>
Gerrit-CC: Joseph Tsai <joe...@digital-static.net>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Matthew Dempsky <mdem...@google.com>
Gerrit-Attention: Joe Tsai <joe...@google.com>
unsatisfied_requirement
open
diffy

Klaus Post (Gerrit)

unread,
Oct 23, 2025, 6:42:01 AMOct 23
to Gerrit Bot, goph...@pubsubhelper.golang.org, Joseph Tsai, Joe Tsai, Matthew Dempsky, Bryan Mills, Joe Tsai, Ian Lance Taylor, golang-co...@googlegroups.com
Attention needed from Ian Lance Taylor, Joe Tsai and Matthew Dempsky

Klaus Post added 1 comment

Patchset-level comments
File-level comment, Patchset 6:
Joseph Tsai . unresolved

This is a lot of code to work around the fact that the API guarantees that we never read a single byte past the final block IF the input reader happens to implement `io.ByteReader`.

Most usages do not actually pass an `io.ByteReader`, and so the implementation implicitly wraps it within a `bufio.Reader`. There seems to be an impedance mismatch in the complexity of this code and what happens in practice.

I argue that we should instead specialize for `bufio.Reader` and make use of the `bufio.Reader.{Buffered,Peek,Discard}` methods to fetch a chunk of the upcoming data. This avoids the tricky logic checking for the final block and computing the maximum number of bits we are allowed to read. This approach is what's taken in https://github.com/dsnet/compress/blob/f66993602bf5da07ef49d35b08e7264ae9fe2b6e/internal/prefix/reader.go#L253-L317 and results in performance gains comparable to what's reported here for much less code complexity.

We already have a dependency on `bufio.Reader`, so special-casing that type seem palatable. Anyone wanting better performance, can just wrap their input in a `bufio.Reader`.

Adding a dependency on "bytes" and "strings" is expensive and shouldn't be done lightly. See https://github.com/golang/go/issues/54098. Until that issue is resolved, we shouldn't be adding dependencies on these if it's not necessary.

Joseph Tsai

It seems that "bufio" already depends on "bytes" and "strings", so my last point isn't valid, but the rest of my argument still stands.

If we want to optimize `bytes.Buffer`, `strings.Reader`, and `bytes.Reader` with my proposed approach, we can wrap it in a way that presents an API similar to `bufio.Reader`: https://github.com/dsnet/compress/blob/master/internal/prefix/wrap.go

Klaus Post

I honestly don't feel like a total rewrite and this is mostly a verbatim copy of well tested code.

Most usages do not actually pass an `io.ByteReader`.

HTTP requests do, which is by far the most common usage.

Open in Gerrit

Related details

Attention is currently required from:
  • Ian Lance Taylor
  • Joe Tsai
  • Matthew Dempsky
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 7
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Joe Tsai <thebroke...@gmail.com>
Gerrit-CC: Joseph Tsai <joe...@digital-static.net>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Matthew Dempsky <mdem...@google.com>
Gerrit-Attention: Joe Tsai <joe...@google.com>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-Comment-Date: Thu, 23 Oct 2025 10:41:52 +0000
unsatisfied_requirement
open
diffy

Klaus Post (Gerrit)

unread,
Oct 23, 2025, 6:56:06 AMOct 23
to Gerrit Bot, goph...@pubsubhelper.golang.org, Joseph Tsai, Joe Tsai, Matthew Dempsky, Bryan Mills, Joe Tsai, Ian Lance Taylor, golang-co...@googlegroups.com
Attention needed from Ian Lance Taylor, Joe Tsai and Matthew Dempsky

Klaus Post added 2 comments

Patchset-level comments
Joseph Tsai . unresolved

This is a lot of code to work around the fact that the API guarantees that we never read a single byte past the final block IF the input reader happens to implement `io.ByteReader`.

Most usages do not actually pass an `io.ByteReader`, and so the implementation implicitly wraps it within a `bufio.Reader`. There seems to be an impedance mismatch in the complexity of this code and what happens in practice.

I argue that we should instead specialize for `bufio.Reader` and make use of the `bufio.Reader.{Buffered,Peek,Discard}` methods to fetch a chunk of the upcoming data. This avoids the tricky logic checking for the final block and computing the maximum number of bits we are allowed to read. This approach is what's taken in https://github.com/dsnet/compress/blob/f66993602bf5da07ef49d35b08e7264ae9fe2b6e/internal/prefix/reader.go#L253-L317 and results in performance gains comparable to what's reported here for much less code complexity.

We already have a dependency on `bufio.Reader`, so special-casing that type seem palatable. Anyone wanting better performance, can just wrap their input in a `bufio.Reader`.

Adding a dependency on "bytes" and "strings" is expensive and shouldn't be done lightly. See https://github.com/golang/go/issues/54098. Until that issue is resolved, we shouldn't be adding dependencies on these if it's not necessary.

Joseph Tsai

It seems that "bufio" already depends on "bytes" and "strings", so my last point isn't valid, but the rest of my argument still stands.

If we want to optimize `bytes.Buffer`, `strings.Reader`, and `bytes.Reader` with my proposed approach, we can wrap it in a way that presents an API similar to `bufio.Reader`: https://github.com/dsnet/compress/blob/master/internal/prefix/wrap.go

Klaus Post

I honestly don't feel like a total rewrite and this is mostly a verbatim copy of well tested code.

Most usages do not actually pass an `io.ByteReader`.

HTTP requests do, which is by far the most common usage.

Klaus Post

Most usages do not actually pass an io.ByteReader, and so the implementation implicitly wraps it within a bufio.Reader.

Actually, it will also speed up these, since the wrapped is never currently accessed directly, but only through `(decompressor).r` interface.

File-level comment, Patchset 7 (Latest):
Klaus Post . resolved

I have attempted to do a generic implementation, but it is something like 10-20% slower than this: https://github.com/klauspost/compress/commit/355740f356693049e3b107fb140a5a4813cf6397

It doesn't really change the imports either.

Open in Gerrit

Related details

Attention is currently required from:
  • Ian Lance Taylor
  • Joe Tsai
  • Matthew Dempsky
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 7
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Joe Tsai <thebroke...@gmail.com>
Gerrit-CC: Joseph Tsai <joe...@digital-static.net>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Matthew Dempsky <mdem...@google.com>
Gerrit-Attention: Joe Tsai <joe...@google.com>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-Comment-Date: Thu, 23 Oct 2025 10:55:57 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: Klaus Post <klau...@gmail.com>
Comment-In-Reply-To: Joseph Tsai <joe...@digital-static.net>
unsatisfied_requirement
open
diffy

AHMAD ابو وليد (Gerrit)

unread,
Oct 23, 2025, 3:09:21 PMOct 23
to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Joseph Tsai, Joe Tsai, Matthew Dempsky, Bryan Mills, Joe Tsai, Ian Lance Taylor, golang-co...@googlegroups.com
Attention needed from Ian Lance Taylor

AHMAD ابو وليد added 1 comment

Patchset-level comments
File-level comment, Patchset 7 (Latest):
AHMAD ابو وليد . resolved

227737

Open in Gerrit

Related details

Attention is currently required from:
  • Ian Lance Taylor
Submit Requirements:
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedNo-Unresolved-Comments
  • requirement is not satisfiedReview-Enforcement
  • requirement is not satisfiedTryBots-Pass
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
Gerrit-Change-Number: 227737
Gerrit-PatchSet: 7
Gerrit-Owner: Gerrit Bot <letsus...@gmail.com>
Gerrit-Reviewer: Joe Tsai <joe...@google.com>
Gerrit-Reviewer: Matthew Dempsky <mdem...@google.com>
Gerrit-CC: AHMAD ابو وليد <miz...@gmail.com>
Gerrit-CC: Bryan Mills <bcm...@google.com>
Gerrit-CC: Ian Lance Taylor <ia...@golang.org>
Gerrit-CC: Joe Tsai <thebroke...@gmail.com>
Gerrit-CC: Joseph Tsai <joe...@digital-static.net>
Gerrit-CC: Klaus Post <klau...@gmail.com>
Gerrit-Attention: Ian Lance Taylor <ia...@golang.org>
Gerrit-Comment-Date: Thu, 23 Oct 2025 19:09:10 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
unsatisfied_requirement
open
diffy
Reply all
Reply to author
Forward
0 new messages