Gerrit Bot (Gerrit)

unread,

Apr 9, 2020, 7:09:46 AM4/9/20

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Gerrit Bot would like Klaus Post to review this change.

compress/flate: improve decompression speed

Improve inflate decompression speed, mainly through 3 optimizations:

1) Read further ahead on non-final blocks.

The reader guarantees that it will not read beyond the end of the stream.
This poses limitations on the decoder in terms of how far it can read ahead
and is set to the size of an end-of-block marker in `f.h1.min = f.bits[endBlockMarker]`.

We can however take advantage of the fact that each block gives
information on whether it is the final block on a stream.
So if we are not reading the final block we can safely add the size
of the smallest block possible with nothing but an EOB marker.

That is a block with a predefined table and a single EOB marker.
Since we know the size of the block header and the encoding
of the EOB this totals to 10 additional bits.
Adding 10 bits reduces the number of stream reads significantly.

Approximately 5% throughput increase.

2) Manually inline f.huffSym call

This change by itself give about about 13% throughput increase.

3) Generate decoders for stdlib io.ByteReader types

We generate decoders for the known implementations of `io.ByteReader`,
namely `*bytes.Buffer`, `*bytes.Reader`, `*bufio.Reader` and `*strings.Reader`.

This change by itself gives about 20-25% throughput increase,
including when an `io.Reader` is passed.

I would say only `*strings.Reader` probably isn't that common.

Minor changes:

* Reuse `h.chunks` and `h.links`.
* Trade some bounds checks for AND operations.
* Change chunks from uint32 to uint16.
* Avoid padding of decompressor struct members.

Per loop allocation removed from benchmarks.
The numbers in the benchmark below includes this change for the 'old' numbers.

```
name                              old time/op    new time/op    delta
Decode/Digits/Huffman/1e4-32        78.0µs ± 0%    50.5µs ± 1%   -35.26%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32         779µs ± 2%     487µs ± 0%   -37.48%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32        7.68ms ± 0%    4.88ms ± 1%   -36.44%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32          88.5µs ± 1%    59.9µs ± 1%   -32.33%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32           963µs ± 1%     678µs ± 1%   -29.58%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32          9.75ms ± 1%    6.90ms ± 0%   -29.21%  (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32        91.2µs ± 1%    61.4µs ± 0%   -32.72%  (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32         954µs ± 0%     675µs ± 0%   -29.25%  (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32        9.67ms ± 0%    6.79ms ± 1%   -29.76%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32    90.7µs ± 1%    61.5µs ± 1%   -32.21%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32     953µs ± 1%     672µs ± 0%   -29.46%  (p=0.016 n=4+5)
Decode/Digits/Compression/1e6-32    9.76ms ± 4%    6.78ms ± 0%   -30.54%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32        90.4µs ± 0%    54.7µs ± 0%   -39.52%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32         885µs ± 0%     538µs ± 0%   -39.19%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32        8.84ms ± 0%    5.44ms ± 0%   -38.46%  (p=0.016 n=4+5)
Decode/Newton/Speed/1e4-32          81.5µs ± 0%    55.1µs ± 1%   -32.42%  (p=0.016 n=4+5)
Decode/Newton/Speed/1e5-32           751µs ± 4%     528µs ± 0%   -29.70%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32          7.49ms ± 2%    5.32ms ± 0%   -28.92%  (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32        73.3µs ± 1%    48.9µs ± 1%   -33.36%  (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32         601µs ± 2%     418µs ± 0%   -30.40%  (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32        5.92ms ± 0%    4.17ms ± 0%   -29.60%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32    72.7µs ± 0%    48.5µs ± 0%   -33.21%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32     597µs ± 0%     418µs ± 0%   -29.90%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32    5.90ms ± 0%    4.15ms ± 0%   -29.63%  (p=0.016 n=4+5)

name                              old speed      new speed      delta
Decode/Digits/Huffman/1e4-32       128MB/s ± 0%   198MB/s ± 1%   +54.46%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32       128MB/s ± 2%   205MB/s ± 0%   +59.92%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32       130MB/s ± 0%   205MB/s ± 1%   +57.33%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32         113MB/s ± 1%   167MB/s ± 1%   +47.79%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32         104MB/s ± 1%   147MB/s ± 1%   +42.01%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32         103MB/s ± 1%   145MB/s ± 0%   +41.26%  (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32       110MB/s ± 1%   163MB/s ± 0%   +48.63%  (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32       105MB/s ± 0%   148MB/s ± 0%   +41.34%  (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32       103MB/s ± 0%   147MB/s ± 1%   +42.37%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32   110MB/s ± 1%   163MB/s ± 1%   +47.51%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32   105MB/s ± 1%   149MB/s ± 0%   +41.77%  (p=0.016 n=4+5)
Decode/Digits/Compression/1e6-32   102MB/s ± 4%   147MB/s ± 0%   +43.91%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32       111MB/s ± 0%   183MB/s ± 0%   +65.35%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32       113MB/s ± 0%   186MB/s ± 0%   +64.44%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32       113MB/s ± 0%   184MB/s ± 0%   +62.50%  (p=0.016 n=4+5)
Decode/Newton/Speed/1e4-32         123MB/s ± 0%   182MB/s ± 1%   +47.98%  (p=0.016 n=4+5)
Decode/Newton/Speed/1e5-32         133MB/s ± 4%   189MB/s ± 0%   +42.20%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32         134MB/s ± 2%   188MB/s ± 0%   +40.67%  (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32       136MB/s ± 1%   205MB/s ± 1%   +50.05%  (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32       166MB/s ± 2%   239MB/s ± 0%   +43.67%  (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32       169MB/s ± 0%   240MB/s ± 0%   +42.04%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32   138MB/s ± 0%   206MB/s ± 0%   +49.73%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32   168MB/s ± 0%   239MB/s ± 0%   +42.66%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32   170MB/s ± 0%   241MB/s ± 0%   +42.11%  (p=0.016 n=4+5)

name                              old alloc/op   new alloc/op   delta
Decode/Digits/Huffman/1e4-32        0.00B ±NaN%    16.00B ± 0%     +Inf%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32         7.60B ± 8%    32.00B ± 0%  +321.05%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32         79.6B ± 1%    264.0B ± 0%  +231.66%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32           80.0B ± 0%     16.0B ± 0%   -80.00%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32            297B ± 0%       33B ± 0%      ~     (p=0.079 n=4+5)
Decode/Digits/Speed/1e6-32          3.86kB ± 0%    0.27kB ± 0%   -92.98%  (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32         48.0B ± 0%     16.0B ± 0%   -66.67%  (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32          297B ± 0%       49B ± 0%   -83.50%  (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32        4.28kB ± 0%    0.38kB ± 0%      ~     (p=0.079 n=4+5)
Decode/Digits/Compression/1e4-32     48.0B ± 0%     16.0B ± 0%   -66.67%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32      297B ± 0%       49B ± 0%      ~     (p=0.079 n=4+5)
Decode/Digits/Compression/1e6-32    4.28kB ± 0%    0.38kB ± 0%   -91.09%  (p=0.000 n=4+5)
Decode/Newton/Huffman/1e4-32          705B ± 0%       16B ± 0%   -97.73%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32        4.50kB ± 0%    0.03kB ± 0%   -99.27%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32        39.4kB ± 0%     0.3kB ± 0%   -99.29%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e4-32            625B ± 0%       16B ± 0%   -97.44%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e5-32          3.21kB ± 0%    0.03kB ± 0%   -98.97%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32          40.6kB ± 0%     0.3kB ± 0%   -99.25%  (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32          513B ± 0%       16B ± 0%   -96.88%  (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32        2.37kB ± 0%    0.03kB ± 0%   -98.61%  (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32        21.2kB ± 0%     0.2kB ± 0%   -98.97%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32      513B ± 0%       16B ± 0%   -96.88%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32    2.37kB ± 0%    0.03kB ± 0%   -98.61%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32    23.0kB ± 0%     0.2kB ± 0%   -99.07%  (p=0.008 n=5+5)

name                              old allocs/op  new allocs/op  delta
Decode/Digits/Huffman/1e4-32         0.00 ±NaN%      1.00 ± 0%     +Inf%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e5-32         0.00 ±NaN%      2.00 ± 0%     +Inf%  (p=0.008 n=5+5)
Decode/Digits/Huffman/1e6-32         0.00 ±NaN%     16.00 ± 0%     +Inf%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e4-32            3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e5-32            6.00 ± 0%      2.00 ± 0%   -66.67%  (p=0.008 n=5+5)
Decode/Digits/Speed/1e6-32            68.0 ± 0%      16.0 ± 0%   -76.47%  (p=0.008 n=5+5)
Decode/Digits/Default/1e4-32          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.008 n=5+5)
Decode/Digits/Default/1e5-32          8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.008 n=5+5)
Decode/Digits/Default/1e6-32          74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e4-32      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e5-32      8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.008 n=5+5)
Decode/Digits/Compression/1e6-32      74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e4-32          9.00 ± 0%      1.00 ± 0%   -88.89%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e5-32          18.0 ± 0%       2.0 ± 0%   -88.89%  (p=0.008 n=5+5)
Decode/Newton/Huffman/1e6-32           156 ± 0%        16 ± 0%   -89.74%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e4-32            13.0 ± 0%       1.0 ± 0%   -92.31%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e5-32            26.0 ± 0%       2.0 ± 0%   -92.31%  (p=0.008 n=5+5)
Decode/Newton/Speed/1e6-32             223 ± 0%        16 ± 0%   -92.83%  (p=0.008 n=5+5)
Decode/Newton/Default/1e4-32          10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.008 n=5+5)
Decode/Newton/Default/1e5-32          27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.008 n=5+5)
Decode/Newton/Default/1e6-32           153 ± 0%        12 ± 0%   -92.16%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e4-32      10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e5-32      27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.008 n=5+5)
Decode/Newton/Compression/1e6-32       145 ± 0%        12 ± 0%   -91.72%  (p=0.008 n=5+5)
```

These changes have been included in github.com/klauspost/compress
for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: 6180f3c8841bad9184e2d9d43c6829eb9b7a63e5
GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go
4 files changed, 1,220 insertions(+), 46 deletions(-)

diff --git a/src/compress/flate/gen_inflate.go b/src/compress/flate/gen_inflate.go
new file mode 100644
index 0000000..9db11f3
--- /dev/null
+++ b/src/compress/flate/gen_inflate.go
@@ -0,0 +1,259 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package flate implements the DEFLATE compressed data format, described in
+// RFC 1951.  The gzip and zlib packages implement access to DEFLATE-based file
+// formats.
+
+// +build generate
+
+//go:generate go run $GOFILE && gofmt -w inflate_gen.go
+
+package main
+
+import (
+	"os"
+	"strings"
+)
+
+func main() {
+	f, err := os.Create("inflate_gen.go")
+	if err != nil {
+		panic(err)
+	}
+	defer f.Close()
+	types := []string{"*bytes.Buffer", "*bytes.Reader", "*bufio.Reader", "*strings.Reader"}
+	names := []string{"BytesBuffer", "BytesReader", "BufioReader", "StringsReader"}
+	imports := []string{"bytes", "bufio", "io", "strings", "math/bits"}
+	f.WriteString(`// Code generated by go generate gen_inflate.go. DO NOT EDIT.
+
+package flate
+
+import (
+`)
+
+	for _, imp := range imports {
+		f.WriteString("\t\"" + imp + "\"\n")
+	}
+	f.WriteString(")\n\n")
+
+	template := `
+
+// $FUNCNAME$ decodes a single Huffman block from f.
+// f.r must be a $TYPE$.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) $FUNCNAME$() {
+	const (
+		stateInit = iota // Zero value must be stateInit
+		stateDict
+	)
+	fr := f.r.($TYPE$)
+	moreBits := func() error {
+		c, err := fr.ReadByte()
+		if err != nil {
+			return noEOF(err)
+		}
+		f.roffset++
+		f.b |= uint32(c) << f.nb
+		f.nb += 8
+		return nil
+	}
+
+	switch f.stepState {
+	case stateInit:
+		goto readLiteral
+	case stateDict:
+		goto copyHistory
+	}
+
+readLiteral:
+	// Read literal and/or (length, distance) according to RFC section 3.2.3.
+	{
+		var v int
+		{
+			// Inlined v, err := f.huffSym(f.hl)
+			// Since a huffmanDecoder can be empty or be composed of a degenerate tree
+			// with single element, huffSym must error on these two edge cases. In both
+			// cases, the chunks slice will be 0 for the invalid sequence, leading it
+			// satisfy the n == 0 check below.
+			n := uint(f.hl.maxRead)
+			// Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+			// but is smart enough to keep local variables in registers, so use nb and b,
+			// inline call to moreBits and reassign b,nb back to f on return.
+			nb, b := f.nb, f.b
+			for {
+				for nb < n {
+					c, err := fr.ReadByte()
+					if err != nil {
+						f.b = b
+						f.nb = nb
+						f.err = noEOF(err)
+						return
+					}
+					f.roffset++
+					b |= uint32(c) << (nb & 31)
+					nb += 8
+				}
+				chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+				n = uint(chunk & huffmanCountMask)
+				if n > huffmanChunkBits {
+					chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+					n = uint(chunk & huffmanCountMask)
+				}
+				if n <= nb {
+					if n == 0 {
+						f.b = b
+						f.nb = nb
+						f.err = CorruptInputError(f.roffset)
+						return
+					}
+					f.b = b >> (n & 31)
+					f.nb = nb - n
+					v = int(chunk >> huffmanValueShift)
+					break
+				}
+			}
+		}
+
+		var n uint // number of bits extra
+		var length int
+		var err error
+		switch {
+		case v < 256:
+			f.dict.writeByte(byte(v))
+			if f.dict.availWrite() == 0 {
+				f.toRead = f.dict.readFlush()
+				f.step = (*decompressor).$FUNCNAME$
+				f.stepState = stateInit
+				return
+			}
+			goto readLiteral
+		case v == 256:
+			f.finishBlock()
+			return
+		// otherwise, reference to older data
+		case v < 265:
+			length = v - (257 - 3)
+			n = 0
+		case v < 269:
+			length = v*2 - (265*2 - 11)
+			n = 1
+		case v < 273:
+			length = v*4 - (269*4 - 19)
+			n = 2
+		case v < 277:
+			length = v*8 - (273*8 - 35)
+			n = 3
+		case v < 281:
+			length = v*16 - (277*16 - 67)
+			n = 4
+		case v < 285:
+			length = v*32 - (281*32 - 131)
+			n = 5
+		case v < maxNumLit:
+			length = 258
+			n = 0
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+		if n > 0 {
+			for f.nb < n {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			length += int(f.b & uint32(1<<n-1))
+			f.b >>= n
+			f.nb -= n
+		}
+
+		var dist int
+		if f.hd == nil {
+			for f.nb < 5 {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+			f.b >>= 5
+			f.nb -= 5
+		} else {
+			if dist, err = f.huffSym(f.hd); err != nil {
+				f.err = err
+				return
+			}
+		}
+
+		switch {
+		case dist < 4:
+			dist++
+		case dist < maxNumDist:
+			nb := uint(dist-2) >> 1
+			// have 1 bit in bottom of dist, need nb more.
+			extra := (dist & 1) << nb
+			for f.nb < nb {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			extra |= int(f.b & uint32(1<<nb-1))
+			f.b >>= nb
+			f.nb -= nb
+			dist = 1<<(nb+1) + 1 + extra
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		// No check on length; encoding can be prescient.
+		if dist > f.dict.histSize() {
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		f.copyLen, f.copyDist = length, dist
+		goto copyHistory
+	}
+
+copyHistory:
+	// Perform a backwards copy according to RFC section 3.2.3.
+	{
+		cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+		if cnt == 0 {
+			cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+		}
+		f.copyLen -= cnt
+
+		if f.dict.availWrite() == 0 || f.copyLen > 0 {
+			f.toRead = f.dict.readFlush()
+			f.step = (*decompressor).$FUNCNAME$ // We need to continue this work
+			f.stepState = stateDict
+			return
+		}
+		goto readLiteral
+	}
+}
+
+`
+	for i, t := range types {
+		s := strings.Replace(template, "$FUNCNAME$", "huffman"+names[i], -1)
+		s = strings.Replace(s, "$TYPE$", t, -1)
+		f.WriteString(s)
+	}
+	f.WriteString("func (f *decompressor) huffmanBlockDecoder() func() {\n")
+	f.WriteString("\tswitch f.r.(type) {\n")
+	for i, t := range types {
+		f.WriteString("\t\tcase " + t + ":\n")
+		f.WriteString("\t\t\treturn f.huffman" + names[i] + "\n")
+	}
+	f.WriteString("\t\tdefault:\n")
+	f.WriteString("\t\t\treturn f.huffmanBlockGeneric")
+	f.WriteString("\t}\n}\n")
+}
diff --git a/src/compress/flate/inflate.go b/src/compress/flate/inflate.go
index 4992139..0145ee8 100644
--- a/src/compress/flate/inflate.go
+++ b/src/compress/flate/inflate.go
@@ -16,7 +16,8 @@
 )
 
 const (
-	maxCodeLen = 16 // max length of Huffman code
+	maxCodeLen     = 16 // max length of Huffman code
+	maxCodeLenMask = 15 // mask for max length of Huffman code
 	// The next three numbers come from the RFC section 3.2.7, with the
 	// additional proviso in section 3.2.5 which implies that distance codes
 	// 30 and 31 should never occur in compressed data.
@@ -102,10 +103,10 @@
 )
 
 type huffmanDecoder struct {
-	min      int                      // the minimum code length
-	chunks   [huffmanNumChunks]uint32 // chunks as described above
-	links    [][]uint32               // overflow links
-	linkMask uint32                   // mask the width of the link table
+	maxRead  int                       // the maximum number of bits we can read and not overread
+	chunks   *[huffmanNumChunks]uint16 // chunks as described above
+	links    [][]uint16                // overflow links
+	linkMask uint32                    // mask the width of the link table
 }
 
 // Initialize Huffman decoding tables from array of code lengths.
@@ -119,12 +120,15 @@
 	// development to supplement the currently ad-hoc unit tests.
 	const sanity = false
 
-	if h.min != 0 {
-		*h = huffmanDecoder{}
+	if h.chunks == nil {
+		h.chunks = &[huffmanNumChunks]uint16{}
+	}
+	if h.maxRead != 0 {
+		*h = huffmanDecoder{chunks: h.chunks, links: h.links}
 	}
 
 	// Count number of codes of each length,
-	// compute min and max length.
+	// compute maxRead and max length.
 	var count [maxCodeLen]int
 	var min, max int
 	for _, n := range lengths {
@@ -137,7 +141,7 @@
 		if n > max {
 			max = n
 		}
-		count[n]++
+		count[n&maxCodeLenMask]++
 	}
 
 	// Empty tree. The decompressor.huffSym function will fail later if the tree
@@ -155,8 +159,8 @@
 	var nextcode [maxCodeLen]int
 	for i := min; i <= max; i++ {
 		code <<= 1
-		nextcode[i] = code
-		code += count[i]
+		nextcode[i&maxCodeLenMask] = code
+		code += count[i&maxCodeLenMask]
 	}
 
 	// Check that the coding is complete (i.e., that we've
@@ -168,14 +172,23 @@
 		return false
 	}
 
-	h.min = min
+	h.maxRead = min
+	chunks := h.chunks[:]
+	for i := range chunks {
+		chunks[i] = 0
+	}
+
 	if max > huffmanChunkBits {
 		numLinks := 1 << (uint(max) - huffmanChunkBits)
 		h.linkMask = uint32(numLinks - 1)
 
 		// create link tables
 		link := nextcode[huffmanChunkBits+1] >> 1
-		h.links = make([][]uint32, huffmanNumChunks-link)
+		if cap(h.links) < huffmanNumChunks-link {
+			h.links = make([][]uint16, huffmanNumChunks-link)
+		} else {
+			h.links = h.links[:huffmanNumChunks-link]
+		}
 		for j := uint(link); j < huffmanNumChunks; j++ {
 			reverse := int(bits.Reverse16(uint16(j)))
 			reverse >>= uint(16 - huffmanChunkBits)
@@ -183,9 +196,16 @@
 			if sanity && h.chunks[reverse] != 0 {
 				panic("impossible: overwriting existing chunk")
 			}
-			h.chunks[reverse] = uint32(off<<huffmanValueShift | (huffmanChunkBits + 1))
-			h.links[off] = make([]uint32, numLinks)
+			h.chunks[reverse] = uint16(off<<huffmanValueShift | (huffmanChunkBits + 1))
+			if cap(h.links[off]) < numLinks {
+				h.links[off] = make([]uint16, numLinks)
+			} else {
+				links := h.links[off][:0]
+				h.links[off] = links[:numLinks]
+			}
 		}
+	} else {
+		h.links = h.links[:0]
 	}
 
 	for i, n := range lengths {
@@ -194,7 +214,7 @@
 		}
 		code := nextcode[n]
 		nextcode[n]++
-		chunk := uint32(i<<huffmanValueShift | n)
+		chunk := uint16(i<<huffmanValueShift | n)
 		reverse := int(bits.Reverse16(uint16(code)))
 		reverse >>= uint(16 - n)
 		if n <= huffmanChunkBits {
@@ -269,10 +289,6 @@
 	r       Reader
 	roffset int64
 
-	// Input bits, in top of b.
-	b  uint32
-	nb uint
-
 	// Huffman decoders for literal/length, distance.
 	h1, h2 huffmanDecoder
 
@@ -283,19 +299,24 @@
 	// Output history, buffer.
 	dict dictDecoder
 
-	// Temporary buffer (avoids repeated allocation).
-	buf [4]byte
-
 	// Next step in the decompression,
 	// and decompression state.
 	step      func(*decompressor)
 	stepState int
-	final     bool
 	err       error
 	toRead    []byte
 	hl, hd    *huffmanDecoder
 	copyLen   int
 	copyDist  int
+
+	// Temporary buffer (avoids repeated allocation).
+	buf [4]byte
+
+	// Input bits, in top of b.
+	b uint32
+
+	nb    uint
+	final bool
 }
 
 func (f *decompressor) nextBlock() {
@@ -316,7 +337,7 @@
 		// compressed, fixed Huffman tables
 		f.hl = &fixedHuffmanDecoder
 		f.hd = nil
-		f.huffmanBlock()
+		f.huffmanBlockDecoder()()
 	case 2:
 		// compressed, dynamic Huffman tables
 		if f.err = f.readHuffman(); f.err != nil {
@@ -324,7 +345,7 @@
 		}
 		f.hl = &f.h1
 		f.hd = &f.h2
-		f.huffmanBlock()
+		f.huffmanBlockDecoder()()
 	default:
 		// 3 is reserved.
 		f.err = CorruptInputError(f.roffset)
@@ -460,12 +481,18 @@
 		return CorruptInputError(f.roffset)
 	}
 
-	// As an optimization, we can initialize the min bits to read at a time
+	// As an optimization, we can initialize the maxRead bits to read at a time
 	// for the HLIT tree to the length of the EOB marker since we know that
 	// every block must terminate with one. This preserves the property that
 	// we never read any extra bytes after the end of the DEFLATE stream.
-	if f.h1.min < f.bits[endBlockMarker] {
-		f.h1.min = f.bits[endBlockMarker]
+	if f.h1.maxRead < f.bits[endBlockMarker] {
+		f.h1.maxRead = f.bits[endBlockMarker]
+	}
+	if !f.final {
+		// If not the final block, the smallest block possible is
+		// a predefined table, BTYPE=01, with a single EOB marker.
+		// This will take up 3 + 7 bits.
+		f.h1.maxRead += 10
 	}
 
 	return nil
@@ -475,7 +502,7 @@
 // hl and hd are the Huffman states for the lit/length values
 // and the distance values, respectively. If hd == nil, using the
 // fixed distance encoding associated with fixed Huffman blocks.
-func (f *decompressor) huffmanBlock() {
+func (f *decompressor) huffmanBlockGeneric() {
 	const (
 		stateInit = iota // Zero value must be stateInit
 		stateDict
@@ -491,19 +518,61 @@
 readLiteral:
 	// Read literal and/or (length, distance) according to RFC section 3.2.3.
 	{
-		v, err := f.huffSym(f.hl)
-		if err != nil {
-			f.err = err
-			return
+		var v int
+		{
+			// Inlined v, err := f.huffSym(f.hl)
+			// Since a huffmanDecoder can be empty or be composed of a degenerate tree
+			// with single element, huffSym must error on these two edge cases. In both
+			// cases, the chunks slice will be 0 for the invalid sequence, leading it
+			// satisfy the n == 0 check below.
+			n := uint(f.hl.maxRead)
+			// Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+			// but is smart enough to keep local variables in registers, so use nb and b,
+			// inline call to moreBits and reassign b,nb back to f on return.
+			nb, b := f.nb, f.b
+			for {
+				for nb < n {
+					c, err := f.r.ReadByte()
+					if err != nil {
+						f.b = b
+						f.nb = nb
+						f.err = noEOF(err)
+						return
+					}
+					f.roffset++
+					b |= uint32(c) << (nb & 31)
+					nb += 8
+				}
+				chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+				n = uint(chunk & huffmanCountMask)
+				if n > huffmanChunkBits {
+					chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+					n = uint(chunk & huffmanCountMask)
+				}
+				if n <= nb {
+					if n == 0 {
+						f.b = b
+						f.nb = nb
+						f.err = CorruptInputError(f.roffset)
+						return
+					}
+					f.b = b >> (n & 31)
+					f.nb = nb - n
+					v = int(chunk >> huffmanValueShift)
+					break
+				}
+			}
 		}
+
 		var n uint // number of bits extra
 		var length int
+		var err error
 		switch {
 		case v < 256:
 			f.dict.writeByte(byte(v))
 			if f.dict.availWrite() == 0 {
 				f.toRead = f.dict.readFlush()
-				f.step = (*decompressor).huffmanBlock
+				f.step = (*decompressor).huffmanBlockGeneric
 				f.stepState = stateInit
 				return
 			}
@@ -610,7 +679,7 @@
 
 		if f.dict.availWrite() == 0 || f.copyLen > 0 {
 			f.toRead = f.dict.readFlush()
-			f.step = (*decompressor).huffmanBlock // We need to continue this work
+			f.step = (*decompressor).huffmanBlockGeneric // We need to continue this work
 			f.stepState = stateDict
 			return
 		}
@@ -621,20 +690,31 @@
 // Copy a single uncompressed data block from input to output.
 func (f *decompressor) dataBlock() {
 	// Uncompressed.
-	// Discard current half-byte.
-	f.nb = 0
-	f.b = 0
+	// Discard current partial byte.
+	left := (f.nb) & 7
+	f.nb -= left
+	f.b >>= left
+
+	offBytes := f.nb >> 3
+	// Unfilled values will be overwritten.
+	f.buf[0] = uint8(f.b)
+	f.buf[1] = uint8(f.b >> 8)
+	f.buf[2] = uint8(f.b >> 16)
+	f.buf[3] = uint8(f.b >> 24)
+
+	f.roffset += int64(offBytes)
+	f.nb, f.b = 0, 0
 
 	// Length then ones-complement of length.
-	nr, err := io.ReadFull(f.r, f.buf[0:4])
+	nr, err := io.ReadFull(f.r, f.buf[offBytes:4])
 	f.roffset += int64(nr)
 	if err != nil {
 		f.err = noEOF(err)
 		return
 	}
-	n := int(f.buf[0]) | int(f.buf[1])<<8
-	nn := int(f.buf[2]) | int(f.buf[3])<<8
-	if uint16(nn) != uint16(^n) {
+	n := uint16(f.buf[0]) | uint16(f.buf[1])<<8
+	nn := uint16(f.buf[2]) | uint16(f.buf[3])<<8
+	if nn != ^n {
 		f.err = CorruptInputError(f.roffset)
 		return
 	}
@@ -645,7 +725,7 @@
 		return
 	}
 
-	f.copyLen = n
+	f.copyLen = int(n)
 	f.copyData()
 }
 
@@ -709,7 +789,7 @@
 	// with single element, huffSym must error on these two edge cases. In both
 	// cases, the chunks slice will be 0 for the invalid sequence, leading it
 	// satisfy the n == 0 check below.
-	n := uint(h.min)
+	n := uint(h.maxRead)
 	// Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
 	// but is smart enough to keep local variables in registers, so use nb and b,
 	// inline call to moreBits and reassign b,nb back to f on return.
@@ -778,6 +858,8 @@
 		r:        makeReader(r),
 		bits:     f.bits,
 		codebits: f.codebits,
+		h1:       f.h1,
+		h2:       f.h2,
 		dict:     f.dict,
 		step:     (*decompressor).nextBlock,
 	}
diff --git a/src/compress/flate/inflate_gen.go b/src/compress/flate/inflate_gen.go
new file mode 100644
index 0000000..b7e6727
--- /dev/null
+++ b/src/compress/flate/inflate_gen.go
@@ -0,0 +1,829 @@
+// Code generated by go generate gen_inflate.go. DO NOT EDIT.
+
+package flate
+
+import (
+	"bufio"
+	"bytes"
+	"math/bits"
+	"strings"
+)
+
+// huffmanBytesBuffer decodes a single Huffman block from f.
+// f.r must be a *bytes.Buffer.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) huffmanBytesBuffer() {
+	const (
+		stateInit = iota // Zero value must be stateInit
+		stateDict
+	)
+	fr := f.r.(*bytes.Buffer)
+	moreBits := func() error {
+		c, err := fr.ReadByte()
+		if err != nil {
+			return noEOF(err)
+		}
+		f.roffset++
+		f.b |= uint32(c) << f.nb
+		f.nb += 8
+		return nil
+	}
+
+	switch f.stepState {
+	case stateInit:
+		goto readLiteral
+	case stateDict:
+		goto copyHistory
+	}
+
+readLiteral:
+	// Read literal and/or (length, distance) according to RFC section 3.2.3.
+	{
+		var v int
+		{
+			// Inlined v, err := f.huffSym(f.hl)
+			// Since a huffmanDecoder can be empty or be composed of a degenerate tree
+			// with single element, huffSym must error on these two edge cases. In both
+			// cases, the chunks slice will be 0 for the invalid sequence, leading it
+			// satisfy the n == 0 check below.
+			n := uint(f.hl.maxRead)
+			// Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+			// but is smart enough to keep local variables in registers, so use nb and b,
+			// inline call to moreBits and reassign b,nb back to f on return.
+			nb, b := f.nb, f.b
+			for {
+				for nb < n {
+					c, err := fr.ReadByte()
+					if err != nil {
+						f.b = b
+						f.nb = nb
+						f.err = noEOF(err)
+						return
+					}
+					f.roffset++
+					b |= uint32(c) << (nb & 31)
+					nb += 8
+				}
+				chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+				n = uint(chunk & huffmanCountMask)
+				if n > huffmanChunkBits {
+					chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+					n = uint(chunk & huffmanCountMask)
+				}
+				if n <= nb {
+					if n == 0 {
+						f.b = b
+						f.nb = nb
+						f.err = CorruptInputError(f.roffset)
+						return
+					}
+					f.b = b >> (n & 31)
+					f.nb = nb - n
+					v = int(chunk >> huffmanValueShift)
+					break
+				}
+			}
+		}
+
+		var n uint // number of bits extra
+		var length int
+		var err error
+		switch {
+		case v < 256:
+			f.dict.writeByte(byte(v))
+			if f.dict.availWrite() == 0 {
+				f.toRead = f.dict.readFlush()
+				f.step = (*decompressor).huffmanBytesBuffer
+				f.stepState = stateInit
+				return
+			}
+			goto readLiteral
+		case v == 256:
+			f.finishBlock()
+			return
+		// otherwise, reference to older data
+		case v < 265:
+			length = v - (257 - 3)
+			n = 0
+		case v < 269:
+			length = v*2 - (265*2 - 11)
+			n = 1
+		case v < 273:
+			length = v*4 - (269*4 - 19)
+			n = 2
+		case v < 277:
+			length = v*8 - (273*8 - 35)
+			n = 3
+		case v < 281:
+			length = v*16 - (277*16 - 67)
+			n = 4
+		case v < 285:
+			length = v*32 - (281*32 - 131)
+			n = 5
+		case v < maxNumLit:
+			length = 258
+			n = 0
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+		if n > 0 {
+			for f.nb < n {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			length += int(f.b & uint32(1<<n-1))
+			f.b >>= n
+			f.nb -= n
+		}
+
+		var dist int
+		if f.hd == nil {
+			for f.nb < 5 {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+			f.b >>= 5
+			f.nb -= 5
+		} else {
+			if dist, err = f.huffSym(f.hd); err != nil {
+				f.err = err
+				return
+			}
+		}
+
+		switch {
+		case dist < 4:
+			dist++
+		case dist < maxNumDist:
+			nb := uint(dist-2) >> 1
+			// have 1 bit in bottom of dist, need nb more.
+			extra := (dist & 1) << nb
+			for f.nb < nb {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			extra |= int(f.b & uint32(1<<nb-1))
+			f.b >>= nb
+			f.nb -= nb
+			dist = 1<<(nb+1) + 1 + extra
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		// No check on length; encoding can be prescient.
+		if dist > f.dict.histSize() {
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		f.copyLen, f.copyDist = length, dist
+		goto copyHistory
+	}
+
+copyHistory:
+	// Perform a backwards copy according to RFC section 3.2.3.
+	{
+		cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+		if cnt == 0 {
+			cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+		}
+		f.copyLen -= cnt
+
+		if f.dict.availWrite() == 0 || f.copyLen > 0 {
+			f.toRead = f.dict.readFlush()
+			f.step = (*decompressor).huffmanBytesBuffer // We need to continue this work
+			f.stepState = stateDict
+			return
+		}
+		goto readLiteral
+	}
+}
+
+// huffmanBytesReader decodes a single Huffman block from f.
+// f.r must be a *bytes.Reader.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) huffmanBytesReader() {
+	const (
+		stateInit = iota // Zero value must be stateInit
+		stateDict
+	)
+	fr := f.r.(*bytes.Reader)
+	moreBits := func() error {
+		c, err := fr.ReadByte()
+		if err != nil {
+			return noEOF(err)
+		}
+		f.roffset++
+		f.b |= uint32(c) << f.nb
+		f.nb += 8
+		return nil
+	}
+
+	switch f.stepState {
+	case stateInit:
+		goto readLiteral
+	case stateDict:
+		goto copyHistory
+	}
+
+readLiteral:
+	// Read literal and/or (length, distance) according to RFC section 3.2.3.
+	{
+		var v int
+		{
+			// Inlined v, err := f.huffSym(f.hl)
+			// Since a huffmanDecoder can be empty or be composed of a degenerate tree
+			// with single element, huffSym must error on these two edge cases. In both
+			// cases, the chunks slice will be 0 for the invalid sequence, leading it
+			// satisfy the n == 0 check below.
+			n := uint(f.hl.maxRead)
+			// Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+			// but is smart enough to keep local variables in registers, so use nb and b,
+			// inline call to moreBits and reassign b,nb back to f on return.
+			nb, b := f.nb, f.b
+			for {
+				for nb < n {
+					c, err := fr.ReadByte()
+					if err != nil {
+						f.b = b
+						f.nb = nb
+						f.err = noEOF(err)
+						return
+					}
+					f.roffset++
+					b |= uint32(c) << (nb & 31)
+					nb += 8
+				}
+				chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+				n = uint(chunk & huffmanCountMask)
+				if n > huffmanChunkBits {
+					chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+					n = uint(chunk & huffmanCountMask)
+				}
+				if n <= nb {
+					if n == 0 {
+						f.b = b
+						f.nb = nb
+						f.err = CorruptInputError(f.roffset)
+						return
+					}
+					f.b = b >> (n & 31)
+					f.nb = nb - n
+					v = int(chunk >> huffmanValueShift)
+					break
+				}
+			}
+		}
+
+		var n uint // number of bits extra
+		var length int
+		var err error
+		switch {
+		case v < 256:
+			f.dict.writeByte(byte(v))
+			if f.dict.availWrite() == 0 {
+				f.toRead = f.dict.readFlush()
+				f.step = (*decompressor).huffmanBytesReader
+				f.stepState = stateInit
+				return
+			}
+			goto readLiteral
+		case v == 256:
+			f.finishBlock()
+			return
+		// otherwise, reference to older data
+		case v < 265:
+			length = v - (257 - 3)
+			n = 0
+		case v < 269:
+			length = v*2 - (265*2 - 11)
+			n = 1
+		case v < 273:
+			length = v*4 - (269*4 - 19)
+			n = 2
+		case v < 277:
+			length = v*8 - (273*8 - 35)
+			n = 3
+		case v < 281:
+			length = v*16 - (277*16 - 67)
+			n = 4
+		case v < 285:
+			length = v*32 - (281*32 - 131)
+			n = 5
+		case v < maxNumLit:
+			length = 258
+			n = 0
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+		if n > 0 {
+			for f.nb < n {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			length += int(f.b & uint32(1<<n-1))
+			f.b >>= n
+			f.nb -= n
+		}
+
+		var dist int
+		if f.hd == nil {
+			for f.nb < 5 {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+			f.b >>= 5
+			f.nb -= 5
+		} else {
+			if dist, err = f.huffSym(f.hd); err != nil {
+				f.err = err
+				return
+			}
+		}
+
+		switch {
+		case dist < 4:
+			dist++
+		case dist < maxNumDist:
+			nb := uint(dist-2) >> 1
+			// have 1 bit in bottom of dist, need nb more.
+			extra := (dist & 1) << nb
+			for f.nb < nb {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			extra |= int(f.b & uint32(1<<nb-1))
+			f.b >>= nb
+			f.nb -= nb
+			dist = 1<<(nb+1) + 1 + extra
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		// No check on length; encoding can be prescient.
+		if dist > f.dict.histSize() {
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		f.copyLen, f.copyDist = length, dist
+		goto copyHistory
+	}
+
+copyHistory:
+	// Perform a backwards copy according to RFC section 3.2.3.
+	{
+		cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+		if cnt == 0 {
+			cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+		}
+		f.copyLen -= cnt
+
+		if f.dict.availWrite() == 0 || f.copyLen > 0 {
+			f.toRead = f.dict.readFlush()
+			f.step = (*decompressor).huffmanBytesReader // We need to continue this work
+			f.stepState = stateDict
+			return
+		}
+		goto readLiteral
+	}
+}
+
+// huffmanBufioReader decodes a single Huffman block from f.
+// f.r must be a *bufio.Reader.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) huffmanBufioReader() {
+	const (
+		stateInit = iota // Zero value must be stateInit
+		stateDict
+	)
+	fr := f.r.(*bufio.Reader)
+	moreBits := func() error {
+		c, err := fr.ReadByte()
+		if err != nil {
+			return noEOF(err)
+		}
+		f.roffset++
+		f.b |= uint32(c) << f.nb
+		f.nb += 8
+		return nil
+	}
+
+	switch f.stepState {
+	case stateInit:
+		goto readLiteral
+	case stateDict:
+		goto copyHistory
+	}
+
+readLiteral:
+	// Read literal and/or (length, distance) according to RFC section 3.2.3.
+	{
+		var v int
+		{
+			// Inlined v, err := f.huffSym(f.hl)
+			// Since a huffmanDecoder can be empty or be composed of a degenerate tree
+			// with single element, huffSym must error on these two edge cases. In both
+			// cases, the chunks slice will be 0 for the invalid sequence, leading it
+			// satisfy the n == 0 check below.
+			n := uint(f.hl.maxRead)
+			// Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+			// but is smart enough to keep local variables in registers, so use nb and b,
+			// inline call to moreBits and reassign b,nb back to f on return.
+			nb, b := f.nb, f.b
+			for {
+				for nb < n {
+					c, err := fr.ReadByte()
+					if err != nil {
+						f.b = b
+						f.nb = nb
+						f.err = noEOF(err)
+						return
+					}
+					f.roffset++
+					b |= uint32(c) << (nb & 31)
+					nb += 8
+				}
+				chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+				n = uint(chunk & huffmanCountMask)
+				if n > huffmanChunkBits {
+					chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+					n = uint(chunk & huffmanCountMask)
+				}
+				if n <= nb {
+					if n == 0 {
+						f.b = b
+						f.nb = nb
+						f.err = CorruptInputError(f.roffset)
+						return
+					}
+					f.b = b >> (n & 31)
+					f.nb = nb - n
+					v = int(chunk >> huffmanValueShift)
+					break
+				}
+			}
+		}
+
+		var n uint // number of bits extra
+		var length int
+		var err error
+		switch {
+		case v < 256:
+			f.dict.writeByte(byte(v))
+			if f.dict.availWrite() == 0 {
+				f.toRead = f.dict.readFlush()
+				f.step = (*decompressor).huffmanBufioReader
+				f.stepState = stateInit
+				return
+			}
+			goto readLiteral
+		case v == 256:
+			f.finishBlock()
+			return
+		// otherwise, reference to older data
+		case v < 265:
+			length = v - (257 - 3)
+			n = 0
+		case v < 269:
+			length = v*2 - (265*2 - 11)
+			n = 1
+		case v < 273:
+			length = v*4 - (269*4 - 19)
+			n = 2
+		case v < 277:
+			length = v*8 - (273*8 - 35)
+			n = 3
+		case v < 281:
+			length = v*16 - (277*16 - 67)
+			n = 4
+		case v < 285:
+			length = v*32 - (281*32 - 131)
+			n = 5
+		case v < maxNumLit:
+			length = 258
+			n = 0
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+		if n > 0 {
+			for f.nb < n {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			length += int(f.b & uint32(1<<n-1))
+			f.b >>= n
+			f.nb -= n
+		}
+
+		var dist int
+		if f.hd == nil {
+			for f.nb < 5 {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+			f.b >>= 5
+			f.nb -= 5
+		} else {
+			if dist, err = f.huffSym(f.hd); err != nil {
+				f.err = err
+				return
+			}
+		}
+
+		switch {
+		case dist < 4:
+			dist++
+		case dist < maxNumDist:
+			nb := uint(dist-2) >> 1
+			// have 1 bit in bottom of dist, need nb more.
+			extra := (dist & 1) << nb
+			for f.nb < nb {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			extra |= int(f.b & uint32(1<<nb-1))
+			f.b >>= nb
+			f.nb -= nb
+			dist = 1<<(nb+1) + 1 + extra
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		// No check on length; encoding can be prescient.
+		if dist > f.dict.histSize() {
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		f.copyLen, f.copyDist = length, dist
+		goto copyHistory
+	}
+
+copyHistory:
+	// Perform a backwards copy according to RFC section 3.2.3.
+	{
+		cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+		if cnt == 0 {
+			cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+		}
+		f.copyLen -= cnt
+
+		if f.dict.availWrite() == 0 || f.copyLen > 0 {
+			f.toRead = f.dict.readFlush()
+			f.step = (*decompressor).huffmanBufioReader // We need to continue this work
+			f.stepState = stateDict
+			return
+		}
+		goto readLiteral
+	}
+}
+
+// huffmanStringsReader decodes a single Huffman block from f.
+// f.r must be a *strings.Reader.
+// hl and hd are the Huffman states for the lit/length values
+// and the distance values, respectively. If hd == nil, using the
+// fixed distance encoding associated with fixed Huffman blocks.
+func (f *decompressor) huffmanStringsReader() {
+	const (
+		stateInit = iota // Zero value must be stateInit
+		stateDict
+	)
+	fr := f.r.(*strings.Reader)
+	moreBits := func() error {
+		c, err := fr.ReadByte()
+		if err != nil {
+			return noEOF(err)
+		}
+		f.roffset++
+		f.b |= uint32(c) << f.nb
+		f.nb += 8
+		return nil
+	}
+
+	switch f.stepState {
+	case stateInit:
+		goto readLiteral
+	case stateDict:
+		goto copyHistory
+	}
+
+readLiteral:
+	// Read literal and/or (length, distance) according to RFC section 3.2.3.
+	{
+		var v int
+		{
+			// Inlined v, err := f.huffSym(f.hl)
+			// Since a huffmanDecoder can be empty or be composed of a degenerate tree
+			// with single element, huffSym must error on these two edge cases. In both
+			// cases, the chunks slice will be 0 for the invalid sequence, leading it
+			// satisfy the n == 0 check below.
+			n := uint(f.hl.maxRead)
+			// Optimization. Compiler isn't smart enough to keep f.b,f.nb in registers,
+			// but is smart enough to keep local variables in registers, so use nb and b,
+			// inline call to moreBits and reassign b,nb back to f on return.
+			nb, b := f.nb, f.b
+			for {
+				for nb < n {
+					c, err := fr.ReadByte()
+					if err != nil {
+						f.b = b
+						f.nb = nb
+						f.err = noEOF(err)
+						return
+					}
+					f.roffset++
+					b |= uint32(c) << (nb & 31)
+					nb += 8
+				}
+				chunk := f.hl.chunks[b&(huffmanNumChunks-1)]
+				n = uint(chunk & huffmanCountMask)
+				if n > huffmanChunkBits {
+					chunk = f.hl.links[chunk>>huffmanValueShift][(b>>huffmanChunkBits)&f.hl.linkMask]
+					n = uint(chunk & huffmanCountMask)
+				}
+				if n <= nb {
+					if n == 0 {
+						f.b = b
+						f.nb = nb
+						f.err = CorruptInputError(f.roffset)
+						return
+					}
+					f.b = b >> (n & 31)
+					f.nb = nb - n
+					v = int(chunk >> huffmanValueShift)
+					break
+				}
+			}
+		}
+
+		var n uint // number of bits extra
+		var length int
+		var err error
+		switch {
+		case v < 256:
+			f.dict.writeByte(byte(v))
+			if f.dict.availWrite() == 0 {
+				f.toRead = f.dict.readFlush()
+				f.step = (*decompressor).huffmanStringsReader
+				f.stepState = stateInit
+				return
+			}
+			goto readLiteral
+		case v == 256:
+			f.finishBlock()
+			return
+		// otherwise, reference to older data
+		case v < 265:
+			length = v - (257 - 3)
+			n = 0
+		case v < 269:
+			length = v*2 - (265*2 - 11)
+			n = 1
+		case v < 273:
+			length = v*4 - (269*4 - 19)
+			n = 2
+		case v < 277:
+			length = v*8 - (273*8 - 35)
+			n = 3
+		case v < 281:
+			length = v*16 - (277*16 - 67)
+			n = 4
+		case v < 285:
+			length = v*32 - (281*32 - 131)
+			n = 5
+		case v < maxNumLit:
+			length = 258
+			n = 0
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+		if n > 0 {
+			for f.nb < n {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			length += int(f.b & uint32(1<<n-1))
+			f.b >>= n
+			f.nb -= n
+		}
+
+		var dist int
+		if f.hd == nil {
+			for f.nb < 5 {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			dist = int(bits.Reverse8(uint8(f.b & 0x1F << 3)))
+			f.b >>= 5
+			f.nb -= 5
+		} else {
+			if dist, err = f.huffSym(f.hd); err != nil {
+				f.err = err
+				return
+			}
+		}
+
+		switch {
+		case dist < 4:
+			dist++
+		case dist < maxNumDist:
+			nb := uint(dist-2) >> 1
+			// have 1 bit in bottom of dist, need nb more.
+			extra := (dist & 1) << nb
+			for f.nb < nb {
+				if err = moreBits(); err != nil {
+					f.err = err
+					return
+				}
+			}
+			extra |= int(f.b & uint32(1<<nb-1))
+			f.b >>= nb
+			f.nb -= nb
+			dist = 1<<(nb+1) + 1 + extra
+		default:
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		// No check on length; encoding can be prescient.
+		if dist > f.dict.histSize() {
+			f.err = CorruptInputError(f.roffset)
+			return
+		}
+
+		f.copyLen, f.copyDist = length, dist
+		goto copyHistory
+	}
+
+copyHistory:
+	// Perform a backwards copy according to RFC section 3.2.3.
+	{
+		cnt := f.dict.tryWriteCopy(f.copyDist, f.copyLen)
+		if cnt == 0 {
+			cnt = f.dict.writeCopy(f.copyDist, f.copyLen)
+		}
+		f.copyLen -= cnt
+
+		if f.dict.availWrite() == 0 || f.copyLen > 0 {
+			f.toRead = f.dict.readFlush()
+			f.step = (*decompressor).huffmanStringsReader // We need to continue this work
+			f.stepState = stateDict
+			return
+		}
+		goto readLiteral
+	}
+}
+
+func (f *decompressor) huffmanBlockDecoder() func() {
+	switch f.r.(type) {
+	case *bytes.Buffer:
+		return f.huffmanBytesBuffer
+	case *bytes.Reader:
+		return f.huffmanBytesReader
+	case *bufio.Reader:
+		return f.huffmanBufioReader
+	case *strings.Reader:
+		return f.huffmanStringsReader
+	default:
+		return f.huffmanBlockGeneric
+	}
+}
diff --git a/src/compress/flate/reader_test.go b/src/compress/flate/reader_test.go
index 9d2943a..cac99a8 100644
--- a/src/compress/flate/reader_test.go
+++ b/src/compress/flate/reader_test.go
@@ -51,10 +51,14 @@
 		w.Close()
 		buf1 := compressed.Bytes()
 		buf0, compressed, w = nil, nil, nil
+		src := bytes.NewReader(buf1)
+		dec := NewReader(src)
 		runtime.GC()
 		b.StartTimer()
 		for i := 0; i < b.N; i++ {
-			io.Copy(ioutil.Discard, NewReader(bytes.NewReader(buf1)))
+			src.Reset(buf1)
+			dec.(Resetter).Reset(src, nil)
+			io.Copy(ioutil.Discard, dec)
 		}
 	})
 }

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Apr 9, 2020, 7:11:10 AM4/9/20

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Gerrit Bot uploaded patch set #2 to this change.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gopher Robot (Gerrit)

unread,

Dec 15, 2021, 2:25:24 PM12/15/21

to Gerrit Dou, Klaus Post, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Gopher Robot abandoned this change.

View Change

Abandoned

GitHub PR golang/go#38324 has been closed.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Heschi Kreinick (Gerrit)

unread,

Dec 16, 2021, 1:02:05 PM12/16/21

to Gerrit Dou, Klaus Post, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Heschi Kreinick restored this change.

View Change

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Ian Lance Taylor (Gerrit)

unread,

May 6, 2022, 8:02:54 PM5/6/22

to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

View Change

1 comment:

Patchset:
- Patch Set #2:
  Sorry this has sat around for so long. Any interest in fixing the merge conflicts? Thanks.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Klaus Post (Gerrit)

unread,

May 7, 2022, 3:54:30 AM5/7/22

to Gerrit Bot, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

View Change

1 comment:

Patchset:

- Patch Set #2:
  Sorry this has sat around for so long. Any interest in fixing the merge conflicts? Thanks.

- Sure. I have an additional 5-10% I found since then, which I will see if I can incorporate.
  Maybe this could be achieved through generics instead of codegen. I will investigate a bit, since I haven't done anything with generics yet.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Jun 7, 2022, 7:23:07 AM6/7/22

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

Gerrit Bot uploaded patch set #3 to this change.

View Change

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: c00babd617fabf290f784c5bd074a82a6af781c5


GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go

4 files changed, 1,386 insertions(+), 46 deletions(-)

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Jun 7, 2022, 8:12:08 AM6/7/22

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

Gerrit Bot uploaded patch set #4 to this change.

View Change

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: ae9b62a9851eb3d8acf14cebfef1d8c663e407dc


GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go

4 files changed, 1,665 insertions(+), 186 deletions(-)

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Jun 7, 2022, 9:10:34 AM6/7/22

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

Gerrit Bot uploaded patch set #5 to this change.

View Change

Decode/Digits/Huffman/1e4-32        63.8µs ± 0%    41.3µs ± 0%   -35.22%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         625µs ± 0%     404µs ± 0%   -35.31%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32        6.25ms ± 0%    4.02ms ± 1%   -35.64%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32          72.1µs ± 1%    48.0µs ± 0%   -33.36%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32           792µs ± 1%     578µs ± 1%   -27.04%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32          8.09ms ± 0%    5.85ms ± 0%   -27.68%   (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32        74.1µs ± 1%    49.7µs ± 1%   -32.87%   (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32         775µs ± 1%     579µs ± 0%   -25.35%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32        7.84ms ± 1%    5.84ms ± 1%   -25.59%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32    74.1µs ± 0%    49.8µs ± 0%   -32.83%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32     777µs ± 1%     579µs ± 0%   -25.47%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32    7.83ms ± 1%    5.83ms ± 0%   -25.59%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32        72.9µs ± 0%    45.6µs ± 1%   -37.48%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32         712µs ± 1%     471µs ± 1%   -33.92%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32        7.11ms ± 0%    4.70ms ± 1%   -33.98%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32          67.0µs ± 1%    45.4µs ± 1%   -32.19%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32           616µs ± 1%     447µs ± 0%   -27.49%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32          6.17ms ± 0%    4.50ms ± 0%   -26.98%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32        60.7µs ± 0%    39.6µs ± 0%   -34.84%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32         492µs ± 0%     360µs ± 0%   -26.84%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32        4.87ms ± 1%    3.59ms ± 0%   -26.34%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32    60.8µs ± 1%    39.6µs ± 1%   -34.92%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32     491µs ± 1%     357µs ± 1%   -27.23%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32    4.84ms ± 0%    3.58ms ± 0%   -26.17%  (p=0.000 n=10+10)



name                              old speed      new speed      delta

Decode/Digits/Huffman/1e4-32       157MB/s ± 0%   242MB/s ± 0%   +54.37%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32       160MB/s ± 0%   247MB/s ± 0%   +54.58%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32       160MB/s ± 0%   249MB/s ± 1%   +55.39%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32         139MB/s ± 1%   208MB/s ± 0%   +50.06%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32         126MB/s ± 1%   173MB/s ± 1%   +37.05%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32         124MB/s ± 0%   171MB/s ± 0%   +38.28%   (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32       135MB/s ± 1%   201MB/s ± 1%   +48.95%   (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32       129MB/s ± 1%   173MB/s ± 0%   +33.95%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32       127MB/s ± 1%   171MB/s ± 1%   +34.39%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32   135MB/s ± 0%   201MB/s ± 0%   +48.88%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32   129MB/s ± 1%   173MB/s ± 0%   +34.17%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32   128MB/s ± 1%   172MB/s ± 0%   +34.39%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32       137MB/s ± 0%   219MB/s ± 1%   +59.96%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32       140MB/s ± 1%   212MB/s ± 1%   +51.32%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32       141MB/s ± 0%   213MB/s ± 1%   +51.46%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32         149MB/s ± 1%   220MB/s ± 1%   +47.48%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32         162MB/s ± 1%   224MB/s ± 0%   +37.92%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32         162MB/s ± 0%   222MB/s ± 0%   +36.95%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32       165MB/s ± 0%   253MB/s ± 0%   +53.47%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32       203MB/s ± 0%   278MB/s ± 0%   +36.68%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32       205MB/s ± 1%   279MB/s ± 0%   +35.77%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32   164MB/s ± 1%   253MB/s ± 1%   +53.66%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32   204MB/s ± 1%   280MB/s ± 1%   +37.41%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32   206MB/s ± 0%   280MB/s ± 0%   +35.44%  (p=0.000 n=10+10)



name                              old alloc/op   new alloc/op   delta

Decode/Digits/Huffman/1e4-32        0.00B ±NaN%    16.00B ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         6.00B ± 0%    36.00B ± 0%  +500.00%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32         64.0B ± 0%    304.0B ± 0%  +375.00%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e4-32           80.0B ± 0%     16.0B ± 0%   -80.00%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32            296B ± 0%       39B ± 0%   -86.82%   (p=0.000 n=10+8)
Decode/Digits/Speed/1e6-32          3.78kB ± 0%    0.33kB ± 0%   -91.29%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32         40.0B ± 0%     16.0B ± 0%   -60.00%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32          287B ± 0%       54B ± 1%   -81.04%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32        4.15kB ± 0%    0.44kB ± 0%   -89.43%   (p=0.000 n=10+9)
Decode/Digits/Compression/1e4-32     40.0B ± 0%     16.0B ± 0%   -60.00%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32      288B ± 0%       55B ± 1%   -81.01%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32    4.15kB ± 0%    0.44kB ± 0%   -89.43%   (p=0.000 n=10+8)
Decode/Newton/Huffman/1e4-32          705B ± 0%       16B ± 0%   -97.73%   (p=0.000 n=9+10)
Decode/Newton/Huffman/1e5-32        4.49kB ± 0%    0.04kB ± 0%   -99.15%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32        39.4kB ± 0%     0.3kB ± 0%   -99.18%   (p=0.000 n=9+10)
Decode/Newton/Speed/1e4-32            617B ± 0%       16B ± 0%   -97.41%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32          3.19kB ± 0%    0.04kB ± 0%   -98.84%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32          40.5kB ± 0%     0.3kB ± 0%   -99.15%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32          513B ± 0%       16B ± 0%   -96.88%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32        2.35kB ± 0%    0.04kB ± 0%   -98.47%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32        21.1kB ± 0%     0.3kB ± 0%   -98.80%    (p=0.000 n=8+8)
Decode/Newton/Compression/1e4-32      513B ± 0%       16B ± 0%   -96.88%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32    2.35kB ± 0%    0.04kB ± 0%   -98.47%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32    22.9kB ± 0%     0.2kB ± 0%   -98.92%    (p=0.000 n=8+8)



name                              old allocs/op  new allocs/op  delta

Decode/Digits/Huffman/1e4-32         0.00 ±NaN%      1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         0.00 ±NaN%      2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32         0.00 ±NaN%     16.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32            3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32            6.00 ± 0%      2.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32            68.0 ± 0%      16.0 ± 0%   -76.47%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32          8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32          74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32      8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32      74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32          9.00 ± 0%      1.00 ± 0%   -88.89%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32          18.0 ± 0%       2.0 ± 0%   -88.89%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32           156 ± 0%        16 ± 0%   -89.74%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32            13.0 ± 0%       1.0 ± 0%   -92.31%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32            26.0 ± 0%       2.0 ± 0%   -92.31%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32             223 ± 0%        16 ± 0%   -92.83%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32          10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32          27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32           153 ± 0%        12 ± 0%   -92.16%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32      10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32      27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32       145 ± 0%        12 ± 0%   -91.72%  (p=0.000 n=10+10)


```

These changes have been included in github.com/klauspost/compress
for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: ae9b62a9851eb3d8acf14cebfef1d8c663e407dc


GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go

4 files changed, 1,665 insertions(+), 186 deletions(-)

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Klaus Post (Gerrit)

unread,

Jun 7, 2022, 9:15:48 AM6/7/22

to Gerrit Bot, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

View Change

2 comments:

Patchset:
- Patch Set #2:
  Sure. I have an additional 5-10% I found since then, which I will see if I can incorporate. […]
  I updated the PR with the latest version as well as resolving the conflict. This matches https://github.com/klauspost/compress/tree/master/flate
  Benchmark numbers updated.
Patchset:
- Patch Set #5:
  New changes integrated. Benchmark updated compared to current master.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Jun 10, 2022, 7:09:48 AM6/10/22

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

Gerrit Bot uploaded patch set #6 to this change.

View Change

Decode/Digits/Huffman/1e4-32        63.8µs ± 0%    41.3µs ± 0%   -35.22%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         625µs ± 0%     404µs ± 0%   -35.31%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32        6.25ms ± 0%    4.02ms ± 1%   -35.64%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32          72.1µs ± 1%    48.0µs ± 0%   -33.36%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32           792µs ± 1%     578µs ± 1%   -27.04%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32          8.09ms ± 0%    5.85ms ± 0%   -27.68%   (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32        74.1µs ± 1%    49.7µs ± 1%   -32.87%   (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32         775µs ± 1%     579µs ± 0%   -25.35%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32        7.84ms ± 1%    5.84ms ± 1%   -25.59%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32    74.1µs ± 0%    49.8µs ± 0%   -32.83%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32     777µs ± 1%     579µs ± 0%   -25.47%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32    7.83ms ± 1%    5.83ms ± 0%   -25.59%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32        72.9µs ± 0%    45.6µs ± 1%   -37.48%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32         712µs ± 1%     471µs ± 1%   -33.92%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32        7.11ms ± 0%    4.70ms ± 1%   -33.98%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32          67.0µs ± 1%    45.4µs ± 1%   -32.19%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32           616µs ± 1%     447µs ± 0%   -27.49%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32          6.17ms ± 0%    4.50ms ± 0%   -26.98%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32        60.7µs ± 0%    39.6µs ± 0%   -34.84%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32         492µs ± 0%     360µs ± 0%   -26.84%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32        4.87ms ± 1%    3.59ms ± 0%   -26.34%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32    60.8µs ± 1%    39.6µs ± 1%   -34.92%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32     491µs ± 1%     357µs ± 1%   -27.23%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32    4.84ms ± 0%    3.58ms ± 0%   -26.17%  (p=0.000 n=10+10)

name                              old speed      new speed      delta

Decode/Digits/Huffman/1e4-32       157MB/s ± 0%   242MB/s ± 0%   +54.37%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32       160MB/s ± 0%   247MB/s ± 0%   +54.58%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32       160MB/s ± 0%   249MB/s ± 1%   +55.39%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32         139MB/s ± 1%   208MB/s ± 0%   +50.06%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e5-32         126MB/s ± 1%   173MB/s ± 1%   +37.05%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32         124MB/s ± 0%   171MB/s ± 0%   +38.28%   (p=0.000 n=9+10)
Decode/Digits/Default/1e4-32       135MB/s ± 1%   201MB/s ± 1%   +48.95%   (p=0.000 n=10+9)
Decode/Digits/Default/1e5-32       129MB/s ± 1%   173MB/s ± 0%   +33.95%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32       127MB/s ± 1%   171MB/s ± 1%   +34.39%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32   135MB/s ± 0%   201MB/s ± 0%   +48.88%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32   129MB/s ± 1%   173MB/s ± 0%   +34.17%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32   128MB/s ± 1%   172MB/s ± 0%   +34.39%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32       137MB/s ± 0%   219MB/s ± 1%   +59.96%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32       140MB/s ± 1%   212MB/s ± 1%   +51.32%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32       141MB/s ± 0%   213MB/s ± 1%   +51.46%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32         149MB/s ± 1%   220MB/s ± 1%   +47.48%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32         162MB/s ± 1%   224MB/s ± 0%   +37.92%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32         162MB/s ± 0%   222MB/s ± 0%   +36.95%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32       165MB/s ± 0%   253MB/s ± 0%   +53.47%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32       203MB/s ± 0%   278MB/s ± 0%   +36.68%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32       205MB/s ± 1%   279MB/s ± 0%   +35.77%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32   164MB/s ± 1%   253MB/s ± 1%   +53.66%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32   204MB/s ± 1%   280MB/s ± 1%   +37.41%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32   206MB/s ± 0%   280MB/s ± 0%   +35.44%  (p=0.000 n=10+10)

name                              old alloc/op   new alloc/op   delta

Decode/Digits/Huffman/1e4-32        0.00B ±NaN%    16.00B ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         6.00B ± 0%    36.00B ± 0%  +500.00%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32         64.0B ± 0%    304.0B ± 0%  +375.00%   (p=0.000 n=10+9)
Decode/Digits/Speed/1e4-32           80.0B ± 0%     16.0B ± 0%   -80.00%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32            296B ± 0%       39B ± 0%   -86.82%   (p=0.000 n=10+8)
Decode/Digits/Speed/1e6-32          3.78kB ± 0%    0.33kB ± 0%   -91.29%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32         40.0B ± 0%     16.0B ± 0%   -60.00%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32          287B ± 0%       54B ± 1%   -81.04%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32        4.15kB ± 0%    0.44kB ± 0%   -89.43%   (p=0.000 n=10+9)
Decode/Digits/Compression/1e4-32     40.0B ± 0%     16.0B ± 0%   -60.00%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32      288B ± 0%       55B ± 1%   -81.01%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32    4.15kB ± 0%    0.44kB ± 0%   -89.43%   (p=0.000 n=10+8)
Decode/Newton/Huffman/1e4-32          705B ± 0%       16B ± 0%   -97.73%   (p=0.000 n=9+10)
Decode/Newton/Huffman/1e5-32        4.49kB ± 0%    0.04kB ± 0%   -99.15%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32        39.4kB ± 0%     0.3kB ± 0%   -99.18%   (p=0.000 n=9+10)
Decode/Newton/Speed/1e4-32            617B ± 0%       16B ± 0%   -97.41%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32          3.19kB ± 0%    0.04kB ± 0%   -98.84%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32          40.5kB ± 0%     0.3kB ± 0%   -99.15%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32          513B ± 0%       16B ± 0%   -96.88%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32        2.35kB ± 0%    0.04kB ± 0%   -98.47%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32        21.1kB ± 0%     0.3kB ± 0%   -98.80%    (p=0.000 n=8+8)
Decode/Newton/Compression/1e4-32      513B ± 0%       16B ± 0%   -96.88%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32    2.35kB ± 0%    0.04kB ± 0%   -98.47%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32    22.9kB ± 0%     0.2kB ± 0%   -98.92%    (p=0.000 n=8+8)

name                              old allocs/op  new allocs/op  delta

Decode/Digits/Huffman/1e4-32         0.00 ±NaN%      1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e5-32         0.00 ±NaN%      2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Huffman/1e6-32         0.00 ±NaN%     16.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e4-32            3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e5-32            6.00 ± 0%      2.00 ± 0%   -66.67%  (p=0.000 n=10+10)
Decode/Digits/Speed/1e6-32            68.0 ± 0%      16.0 ± 0%   -76.47%  (p=0.000 n=10+10)
Decode/Digits/Default/1e4-32          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Decode/Digits/Default/1e5-32          8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.000 n=10+10)
Decode/Digits/Default/1e6-32          74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e4-32      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e5-32      8.00 ± 0%      3.00 ± 0%   -62.50%  (p=0.000 n=10+10)
Decode/Digits/Compression/1e6-32      74.0 ± 0%      23.0 ± 0%   -68.92%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e4-32          9.00 ± 0%      1.00 ± 0%   -88.89%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e5-32          18.0 ± 0%       2.0 ± 0%   -88.89%  (p=0.000 n=10+10)
Decode/Newton/Huffman/1e6-32           156 ± 0%        16 ± 0%   -89.74%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e4-32            13.0 ± 0%       1.0 ± 0%   -92.31%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e5-32            26.0 ± 0%       2.0 ± 0%   -92.31%  (p=0.000 n=10+10)
Decode/Newton/Speed/1e6-32             223 ± 0%        16 ± 0%   -92.83%  (p=0.000 n=10+10)
Decode/Newton/Default/1e4-32          10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.000 n=10+10)
Decode/Newton/Default/1e5-32          27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.000 n=10+10)
Decode/Newton/Default/1e6-32           153 ± 0%        12 ± 0%   -92.16%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e4-32      10.0 ± 0%       1.0 ± 0%   -90.00%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e5-32      27.0 ± 0%       2.0 ± 0%   -92.59%  (p=0.000 n=10+10)
Decode/Newton/Compression/1e6-32       145 ± 0%        12 ± 0%   -91.72%  (p=0.000 n=10+10)


```

These changes have been included in github.com/klauspost/compress
for a little more than a month now, which includes fuzz testing.

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c

Change-Id: I7e346330512116baa27e448aa606a2f4e551054c
GitHub-Last-Rev: 161f02171c05a37751c7ca314daf5e5b4809905f


GitHub-Pull-Request: golang/go#38324
---
A src/compress/flate/gen_inflate.go
M src/compress/flate/inflate.go
A src/compress/flate/inflate_gen.go
M src/compress/flate/reader_test.go

4 files changed, 1,653 insertions(+), 186 deletions(-)

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Ian Lance Taylor (Gerrit)

unread,

Sep 8, 2022, 8:22:54 PM9/8/22

to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

View Change

1 comment:

Patchset:
- Patch Set #6:
  Ping Joe Tsai: do you have any time to look at this one? Thanks.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Ian Lance Taylor (Gerrit)

unread,

Nov 8, 2022, 4:29:51 PM11/8/22

to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Joe Tsai, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

View Change

1 comment:

Patchset:
- Patch Set #6:
  Ping @thebroke...@gmail.com

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Joseph Tsai (Gerrit)

unread,

Nov 8, 2022, 4:49:42 PM11/8/22

to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Joe Tsai, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

View Change

1 comment:

Patchset:
- Patch Set #6:
  This is a lot of code to work around the fact that the API guarantees that we never read a single byte past the final block IF the input reader happens to implement `io.ByteReader`.
  Most usages do not actually pass an `io.ByteReader`, and so the implementation implicitly wraps it within a `bufio.Reader`. There seems to be an impedance mismatch in the complexity of this code and what happens in practice.
  I argue that we should instead specialize for `bufio.Reader` and make use of the `bufio.Reader.{Buffered,Peek,Discard}` methods to fetch a chunk of the upcoming data. This avoids the tricky logic checking for the final block and computing the maximum number of bits we are allowed to read. This approach is what's taken in https://github.com/dsnet/compress/blob/f66993602bf5da07ef49d35b08e7264ae9fe2b6e/internal/prefix/reader.go#L253-L317 and results in performance gains comparable to what's reported here for much less code complexity.
  We already have a dependency on `bufio.Reader`, so special-casing that type seem palatable. Anyone wanting better performance, can just wrap their input in a `bufio.Reader`.
  Adding a dependency on "bytes" and "strings" is expensive and shouldn't be done lightly. See https://github.com/golang/go/issues/54098. Until that issue is resolved, we shouldn't be adding dependencies on these if it's not necessary.

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Joseph Tsai (Gerrit)

unread,

Nov 8, 2022, 4:50:37 PM11/8/22

to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Joe Tsai, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

View Change

1 comment:

Patchset:
- Patch Set #6:
  Ping Joe Tsai: do you have any time to look at this one? Thanks.
  It seems that I never got the email since I'm no longer reachable at joe...@google.com :|

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Joseph Tsai (Gerrit)

unread,

Nov 8, 2022, 4:54:59 PM11/8/22

to Gerrit Bot, Klaus Post, goph...@pubsubhelper.golang.org, Joe Tsai, Matthew Dempsky, Bryan Mills, Ian Lance Taylor, golang-co...@googlegroups.com

Attention is currently required from: Ian Lance Taylor.

View Change

1 comment:

Patchset:
- Patch Set #6:
  This is a lot of code to work around the fact that the API guarantees that we never read a single by […]
  It seems that "bufio" already depends on "bytes" and "strings", so my last point isn't valid, but the rest of my argument still stands.
  If we want to optimize `bytes.Buffer`, `strings.Reader`, and `bytes.Reader` with my proposed approach, we can wrap it in a way that presents an API similar to `bufio.Reader`: https://github.com/dsnet/compress/blob/master/internal/prefix/wrap.go

To view, visit change 227737. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit Bot (Gerrit)

unread,

Oct 23, 2025, 6:41:04 AMOct 23

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Ian Lance Taylor, Joe Tsai and Matthew Dempsky

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #7 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Ian Lance Taylor
Joe Tsai
Matthew Dempsky

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 23, 2025, 6:42:01 AMOct 23

to Gerrit Bot, goph...@pubsubhelper.golang.org, Joseph Tsai, Joe Tsai, Matthew Dempsky, Bryan Mills, Joe Tsai, Ian Lance Taylor, golang-co...@googlegroups.com

Attention needed from Ian Lance Taylor, Joe Tsai and Matthew Dempsky

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 6:

Joseph Tsai . unresolved

This is a lot of code to work around the fact that the API guarantees that we never read a single byte past the final block IF the input reader happens to implement `io.ByteReader`.
Most usages do not actually pass an `io.ByteReader`, and so the implementation implicitly wraps it within a `bufio.Reader`. There seems to be an impedance mismatch in the complexity of this code and what happens in practice.
I argue that we should instead specialize for `bufio.Reader` and make use of the `bufio.Reader.{Buffered,Peek,Discard}` methods to fetch a chunk of the upcoming data. This avoids the tricky logic checking for the final block and computing the maximum number of bits we are allowed to read. This approach is what's taken in https://github.com/dsnet/compress/blob/f66993602bf5da07ef49d35b08e7264ae9fe2b6e/internal/prefix/reader.go#L253-L317 and results in performance gains comparable to what's reported here for much less code complexity.
We already have a dependency on `bufio.Reader`, so special-casing that type seem palatable. Anyone wanting better performance, can just wrap their input in a `bufio.Reader`.
Adding a dependency on "bytes" and "strings" is expensive and shouldn't be done lightly. See https://github.com/golang/go/issues/54098. Until that issue is resolved, we shouldn't be adding dependencies on these if it's not necessary.

Joseph Tsai

It seems that "bufio" already depends on "bytes" and "strings", so my last point isn't valid, but the rest of my argument still stands.
If we want to optimize `bytes.Buffer`, `strings.Reader`, and `bytes.Reader` with my proposed approach, we can wrap it in a way that presents an API similar to `bufio.Reader`: https://github.com/dsnet/compress/blob/master/internal/prefix/wrap.go

Klaus Post

I honestly don't feel like a total rewrite and this is mostly a verbatim copy of well tested code.

Most usages do not actually pass an `io.ByteReader`.

HTTP requests do, which is by far the most common usage.

Open in Gerrit

Related details

Attention is currently required from:

Ian Lance Taylor
Joe Tsai
Matthew Dempsky

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 23, 2025, 6:56:06 AMOct 23

to Gerrit Bot, goph...@pubsubhelper.golang.org, Joseph Tsai, Joe Tsai, Matthew Dempsky, Bryan Mills, Joe Tsai, Ian Lance Taylor, golang-co...@googlegroups.com

Attention needed from Ian Lance Taylor, Joe Tsai and Matthew Dempsky

Klaus Post added 2 comments

Patchset-level comments

File-level comment, Patchset 6:

Joseph Tsai . unresolved

This is a lot of code to work around the fact that the API guarantees that we never read a single byte past the final block IF the input reader happens to implement `io.ByteReader`.
Most usages do not actually pass an `io.ByteReader`, and so the implementation implicitly wraps it within a `bufio.Reader`. There seems to be an impedance mismatch in the complexity of this code and what happens in practice.
I argue that we should instead specialize for `bufio.Reader` and make use of the `bufio.Reader.{Buffered,Peek,Discard}` methods to fetch a chunk of the upcoming data. This avoids the tricky logic checking for the final block and computing the maximum number of bits we are allowed to read. This approach is what's taken in https://github.com/dsnet/compress/blob/f66993602bf5da07ef49d35b08e7264ae9fe2b6e/internal/prefix/reader.go#L253-L317 and results in performance gains comparable to what's reported here for much less code complexity.
We already have a dependency on `bufio.Reader`, so special-casing that type seem palatable. Anyone wanting better performance, can just wrap their input in a `bufio.Reader`.
Adding a dependency on "bytes" and "strings" is expensive and shouldn't be done lightly. See https://github.com/golang/go/issues/54098. Until that issue is resolved, we shouldn't be adding dependencies on these if it's not necessary.

Joseph Tsai

It seems that "bufio" already depends on "bytes" and "strings", so my last point isn't valid, but the rest of my argument still stands.
If we want to optimize `bytes.Buffer`, `strings.Reader`, and `bytes.Reader` with my proposed approach, we can wrap it in a way that presents an API similar to `bufio.Reader`: https://github.com/dsnet/compress/blob/master/internal/prefix/wrap.go

Klaus Post

I honestly don't feel like a total rewrite and this is mostly a verbatim copy of well tested code.
Most usages do not actually pass an `io.ByteReader`.
HTTP requests do, which is by far the most common usage.

Klaus Post

Most usages do not actually pass an io.ByteReader, and so the implementation implicitly wraps it within a bufio.Reader.

Actually, it will also speed up these, since the wrapped is never currently accessed directly, but only through `(decompressor).r` interface.

File-level comment, Patchset 7 (Latest):

Klaus Post . resolved

I have attempted to do a generic implementation, but it is something like 10-20% slower than this: https://github.com/klauspost/compress/commit/355740f356693049e3b107fb140a5a4813cf6397

It doesn't really change the imports either.

Open in Gerrit

Related details

Attention is currently required from:

Ian Lance Taylor
Joe Tsai
Matthew Dempsky

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

AHMAD ابو وليد (Gerrit)‎

unread,

Oct 23, 2025, 3:09:21 PMOct 23

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Joseph Tsai, Joe Tsai, Matthew Dempsky, Bryan Mills, Joe Tsai, Ian Lance Taylor, golang-co...@googlegroups.com

Attention needed from Ian Lance Taylor

AHMAD ابو وليد added 1 comment

Patchset-level comments

File-level comment, Patchset 7 (Latest):

AHMAD ابو وليد . resolved

227737

Open in Gerrit

Related details

Attention is currently required from:

Ian Lance Taylor

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Reply all

Reply to author

Forward

[go] compress/flate: improve decompression speed

Gerrit Bot (Gerrit)

Gerrit Bot (Gerrit)

Gopher Robot (Gerrit)

Heschi Kreinick (Gerrit)

Ian Lance Taylor (Gerrit)

Klaus Post (Gerrit)

Gerrit Bot (Gerrit)

Gerrit Bot (Gerrit)

Gerrit Bot (Gerrit)

Klaus Post (Gerrit)

Gerrit Bot (Gerrit)

Ian Lance Taylor (Gerrit)

Ian Lance Taylor (Gerrit)

Joseph Tsai (Gerrit)

Joseph Tsai (Gerrit)

Joseph Tsai (Gerrit)

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

Klaus Post (Gerrit)

Klaus Post added 1 comment

Related details

Klaus Post (Gerrit)

Klaus Post added 2 comments

Related details

AHMAD ابو وليد (Gerrit)‎

AHMAD ابو وليد added 1 comment

Related details