Gerrit Bot (Gerrit)

unread,

Sep 27, 2025, 8:48:47 AM9/27/25

to goph...@pubsubhelper.golang.org, Klaus Post, golang-co...@googlegroups.com

Gerrit Bot has uploaded the change for review

Commit message

compress/flate: improve compression speed

Fixes #75532

This improves the compression speed of the flate package.
This is a cleaned version of github.com/klauspost/compress/flate

Overall changes:

* Compression level 2-6 are custom implementations.
* Compression level 7-9 tweaked to match levels 2-6 with minor improvements.
* Tokens are encoded and indexed when added.
* Huffman encoding attempts to continue blocks instead of always starting new one.
* Loads/Stores in separate functions and can be made to use unsafe.

In overall terms this attempts to better balance out the compression levels,
which tended to have little spread in the top levels.

The intention is to place "default" at the place where performance drops off
considerably without a proportional improvement in compression ratio.
In my package I have set "5" to be the default, but this keeps it at level 6.

"Unsafe" operations have been removed for now.
They can trivially be added back.
This is an approximately 10% speed penalty.

There are built-in benchmarks using the standard library's benchmark below.
I do not think this is a particular good representation of different
data types, so I have also done benchmarks on various data types.

I have compiled the benchmarks on https://stdeflate.klauspost.com/

The main focus has been on level 1 (fastest),
level 5+6 (default) and level 9 (smallest).
It is quite rare that levels outside of this are used, but they should still
fit their role reasonably.

Level 9 will attempt more aggressive compression,
but will also typically be slightly slower than before.

I hope the graphs above shows that focusing on a few data types
doesn't always give the full picture.

My own observations:

Level 1 and 2 are often "trading places" depending on data type.
Since level 1 is usually the lowest compressing of the two -
mostly slightly faster, with lower memory usage -
it is placed as the lowest.

The switchover between level 6 and 7 is not always smooth,
since the search method changes significantly.

Random data is now ~100x faster on levels 2-6, and ~3 faster on levels 7-9.
You can feed pre-compressed data with no significant speed penalty.

benchmark                                     old ns/op     new ns/op     delta
BenchmarkEncode/Digits/Huffman/1e4-32         11431         8001          -30.01%
BenchmarkEncode/Digits/Huffman/1e5-32         123175        74780         -39.29%
BenchmarkEncode/Digits/Huffman/1e6-32         1260402       750022        -40.49%
BenchmarkEncode/Digits/Speed/1e4-32           35100         23758         -32.31%
BenchmarkEncode/Digits/Speed/1e5-32           675355        385954        -42.85%
BenchmarkEncode/Digits/Speed/1e6-32           6878375       4873784       -29.14%
BenchmarkEncode/Digits/Default/1e4-32         63411         40974         -35.38%
BenchmarkEncode/Digits/Default/1e5-32         1815762       801563        -55.86%
BenchmarkEncode/Digits/Default/1e6-32         18875894      8101836       -57.08%
BenchmarkEncode/Digits/Compression/1e4-32     63859         85275         +33.54%
BenchmarkEncode/Digits/Compression/1e5-32     1803745       2752174       +52.58%
BenchmarkEncode/Digits/Compression/1e6-32     18931995      30727403      +62.30%
BenchmarkEncode/Newton/Huffman/1e4-32         15770         11108         -29.56%
BenchmarkEncode/Newton/Huffman/1e5-32         134567        85103         -36.76%
BenchmarkEncode/Newton/Huffman/1e6-32         1663889       1030186       -38.09%
BenchmarkEncode/Newton/Speed/1e4-32           32749         22934         -29.97%
BenchmarkEncode/Newton/Speed/1e5-32           565609        336750        -40.46%
BenchmarkEncode/Newton/Speed/1e6-32           5996011       3815437       -36.37%
BenchmarkEncode/Newton/Default/1e4-32         70505         34148         -51.57%
BenchmarkEncode/Newton/Default/1e5-32         2374066       570673        -75.96%
BenchmarkEncode/Newton/Default/1e6-32         24562355      5975917       -75.67%
BenchmarkEncode/Newton/Compression/1e4-32     71505         77670         +8.62%
BenchmarkEncode/Newton/Compression/1e5-32     3345768       3730804       +11.51%
BenchmarkEncode/Newton/Compression/1e6-32     35770364      39768939      +11.18%

benchmark                                     old MB/s     new MB/s     speedup
BenchmarkEncode/Digits/Huffman/1e4-32         874.80       1249.91      1.43x
BenchmarkEncode/Digits/Huffman/1e5-32         811.86       1337.25      1.65x
BenchmarkEncode/Digits/Huffman/1e6-32         793.40       1333.29      1.68x
BenchmarkEncode/Digits/Speed/1e4-32           284.90       420.91       1.48x
BenchmarkEncode/Digits/Speed/1e5-32           148.07       259.10       1.75x
BenchmarkEncode/Digits/Speed/1e6-32           145.38       205.18       1.41x
BenchmarkEncode/Digits/Default/1e4-32         157.70       244.06       1.55x
BenchmarkEncode/Digits/Default/1e5-32         55.07        124.76       2.27x
BenchmarkEncode/Digits/Default/1e6-32         52.98        123.43       2.33x
BenchmarkEncode/Digits/Compression/1e4-32     156.59       117.27       0.75x
BenchmarkEncode/Digits/Compression/1e5-32     55.44        36.33        0.66x
BenchmarkEncode/Digits/Compression/1e6-32     52.82        32.54        0.62x
BenchmarkEncode/Newton/Huffman/1e4-32         634.13       900.25       1.42x
BenchmarkEncode/Newton/Huffman/1e5-32         743.12       1175.04      1.58x
BenchmarkEncode/Newton/Huffman/1e6-32         601.00       970.70       1.62x
BenchmarkEncode/Newton/Speed/1e4-32           305.35       436.03       1.43x
BenchmarkEncode/Newton/Speed/1e5-32           176.80       296.96       1.68x
BenchmarkEncode/Newton/Speed/1e6-32           166.78       262.09       1.57x
BenchmarkEncode/Newton/Default/1e4-32         141.83       292.84       2.06x
BenchmarkEncode/Newton/Default/1e5-32         42.12        175.23       4.16x
BenchmarkEncode/Newton/Default/1e6-32         40.71        167.34       4.11x
BenchmarkEncode/Newton/Compression/1e4-32     139.85       128.75       0.92x
BenchmarkEncode/Newton/Compression/1e5-32     29.89        26.80        0.90x
BenchmarkEncode/Newton/Compression/1e6-32     27.96        25.15        0.90x

Static Memory Usage:

Before:
Level -2: Memory Used: 704KB, 8 allocs
Level -1: Memory Used: 776KB, 7 allocs
Level 0: Memory Used: 704KB, 7 allocs
Level 1: Memory Used: 1160KB, 13 allocs
Level 2: Memory Used: 776KB, 8 allocs
Level 3: Memory Used: 776KB, 8 allocs
Level 4: Memory Used: 776KB, 8 allocs
Level 5: Memory Used: 776KB, 8 allocs
Level 6: Memory Used: 776KB, 8 allocs
Level 7: Memory Used: 776KB, 8 allocs
Level 8: Memory Used: 776KB, 9 allocs
Level 9: Memory Used: 776KB, 8 allocs

After:
Level -2: Memory Used: 272KB, 12 allocs
Level -1: Memory Used: 1016KB, 7 allocs
Level 0: Memory Used: 304KB, 6 allocs
Level 1: Memory Used: 760KB, 13 allocs
Level 2: Memory Used: 1144KB, 8 allocs
Level 3: Memory Used: 1144KB, 8 allocs
Level 4: Memory Used: 888KB, 14 allocs
Level 5: Memory Used: 1016KB, 8 allocs
Level 6: Memory Used: 1016KB, 8 allocs
Level 7: Memory Used: 952KB, 7 allocs
Level 8: Memory Used: 952KB, 7 allocs
Level 9: Memory Used: 1080KB, 9 allocs

This package has been fuzz tested for about 24 hours.
Currently, there is about 1h between new "interesting" finds.

Change-Id: Icb4c9839dc8f1bb96fd6d548038679a7554a559b

GitHub-Last-Rev: 0a5dc67e9ddeb1618fecdaefd847c085f01de2d9

GitHub-Pull-Request: golang/go#75624

Change diff

diff --git a/src/compress/flate/deflate.go b/src/compress/flate/deflate.go
index 6697f3a..3819f2e 100644
--- a/src/compress/flate/deflate.go
+++ b/src/compress/flate/deflate.go
@@ -27,132 +27,121 @@
 	// RFC 1951 compliant. That is, any valid DEFLATE decompressor will
 	// continue to be able to decompress this output.
 	HuffmanOnly = -2
-)
 
-const (
-	logWindowSize = 15
-	windowSize    = 1 << logWindowSize
-	windowMask    = windowSize - 1
+	logWindowSize  = 15
+	windowSize     = 1 << logWindowSize
+	windowMask     = windowSize - 1
+	minMatchLength = 4   // The smallest match that the compressor looks for
+	maxMatchLength = 258 // The longest match for the compressor
+	minOffsetSize  = 1   // The shortest offset that makes any sense
 
-	// The LZ77 step produces a sequence of literal tokens and <length, offset>
-	// pair tokens. The offset is also known as distance. The underlying wire
-	// format limits the range of lengths and offsets. For example, there are
-	// 256 legitimate lengths: those in the range [3, 258]. This package's
-	// compressor uses a higher minimum match length, enabling optimizations
-	// such as finding matches via 32-bit loads and compares.
-	baseMatchLength = 3       // The smallest match length per the RFC section 3.2.5
-	minMatchLength  = 4       // The smallest match length that the compressor actually emits
-	maxMatchLength  = 258     // The largest match length
-	baseMatchOffset = 1       // The smallest match offset
-	maxMatchOffset  = 1 << 15 // The largest match offset
-
-	// The maximum number of tokens we put into a single flate block, just to
-	// stop things from getting too large.
-	maxFlateBlockTokens = 1 << 14
+	// The maximum number of tokens we will encode at the time.
+	// Smaller sizes usually creates less optimal blocks.
+	// Bigger can make context switching slow.
+	// We use this for levels 7-9, so we make it big.
+	maxFlateBlockTokens = 1 << 15
 	maxStoreBlockSize   = 65535
 	hashBits            = 17 // After 17 performance degrades
 	hashSize            = 1 << hashBits
 	hashMask            = (1 << hashBits) - 1
-	maxHashOffset       = 1 << 24
+	maxHashOffset       = 1 << 28
 
 	skipNever = math.MaxInt32
 )
 
 type compressionLevel struct {
-	level, good, lazy, nice, chain, fastSkipHashing int
+	good, lazy, nice, chain, level int
 }
 
 var levels = []compressionLevel{
-	{0, 0, 0, 0, 0, 0}, // NoCompression.
-	{1, 0, 0, 0, 0, 0}, // BestSpeed uses a custom algorithm; see deflatefast.go.
-	// For levels 2-3 we don't bother trying with lazy matches.
-	{2, 4, 0, 16, 8, 5},
-	{3, 4, 0, 32, 32, 6},
-	// Levels 4-9 use increasingly more lazy matching
+	{}, // 0
+	// Level 1-6 uses specialized algorithm - values not used
+	{0, 0, 0, 0, 1},
+	{0, 0, 0, 0, 2},
+	{0, 0, 0, 0, 3},
+	{0, 0, 0, 0, 4},
+	{0, 0, 0, 0, 5},
+	{0, 0, 0, 0, 6},
+	// Levels 7-9 use increasingly more lazy matching
 	// and increasingly stringent conditions for "good enough".
-	{4, 4, 4, 16, 16, skipNever},
-	{5, 8, 16, 32, 32, skipNever},
-	{6, 8, 16, 128, 128, skipNever},
-	{7, 8, 32, 128, 256, skipNever},
-	{8, 32, 128, 258, 1024, skipNever},
-	{9, 32, 258, 258, 4096, skipNever},
+	{8, 12, 16, 24, 7},
+	{16, 30, 40, 64, 8},
+	{32, 258, 258, 1024, 9},
 }
 
-type compressor struct {
-	compressionLevel
+// advancedState contains state for the advanced levels, with bigger hash tables, etc.
+type advancedState struct {
+	// deflate state
+	length         int
+	offset         int
+	maxInsertIndex int
+	chainHead      int
+	hashOffset     int
 
-	w          *huffmanBitWriter
-	bulkHasher func([]byte, []uint32)
+	ii uint16 // position of last match, intended to overflow to reset.
 
-	// compression algorithm
-	fill      func(*compressor, []byte) int // copy data to window
-	step      func(*compressor)             // process window
-	bestSpeed *deflateFast                  // Encoder for BestSpeed
+	// input window: unprocessed data is window[index:windowEnd]
+	index     int
+	hashMatch [maxMatchLength + minMatchLength]uint32
 
 	// Input hash chains
 	// hashHead[hashValue] contains the largest inputIndex with the specified hash value
 	// If hashHead[hashValue] is within the current window, then
 	// hashPrev[hashHead[hashValue] & windowMask] contains the previous index
 	// with the same hash value.
-	chainHead  int
-	hashHead   [hashSize]uint32
-	hashPrev   [windowSize]uint32
-	hashOffset int
+	hashHead [hashSize]uint32
+	hashPrev [windowSize]uint32
+}
 
-	// input window: unprocessed data is window[index:windowEnd]
-	index         int
-	window        []byte
-	windowEnd     int
-	blockStart    int  // window index where current tokens start
-	byteAvailable bool // if true, still need to process window[index-1].
+type compressor struct {
+	compressionLevel
 
-	sync bool // requesting flush
+	h *huffmanEncoder
+	w *huffmanBitWriter
+
+	// compression algorithm
+	fill func(*compressor, []byte) int // copy data to window
+	step func(*compressor)             // process window
+
+	window     []byte
+	windowEnd  int
+	blockStart int // window index where current tokens start
+	err        error
 
 	// queued output tokens
-	tokens []token
+	tokens tokens
+	fast   fastEnc
+	state  *advancedState
 
-	// deflate state
-	length         int
-	offset         int
-	maxInsertIndex int
-	err            error
-
-	// hashMatch must be able to contain hashes for the maximum match length.
-	hashMatch [maxMatchLength - 1]uint32
+	sync          bool // requesting flush
+	byteAvailable bool // if true, still need to process window[index-1].
 }
 
 func (d *compressor) fillDeflate(b []byte) int {
-	if d.index >= 2*windowSize-(minMatchLength+maxMatchLength) {
+	s := d.state
+	if s.index >= 2*windowSize-(minMatchLength+maxMatchLength) {
 		// shift the window by windowSize
-		copy(d.window, d.window[windowSize:2*windowSize])
-		d.index -= windowSize
+		//copy(d.window[:], d.window[windowSize:2*windowSize])
+		*(*[windowSize]byte)(d.window) = *(*[windowSize]byte)(d.window[windowSize:])
+		s.index -= windowSize
 		d.windowEnd -= windowSize
 		if d.blockStart >= windowSize {
 			d.blockStart -= windowSize
 		} else {
 			d.blockStart = math.MaxInt32
 		}
-		d.hashOffset += windowSize
-		if d.hashOffset > maxHashOffset {
-			delta := d.hashOffset - 1
-			d.hashOffset -= delta
-			d.chainHead -= delta
-
+		s.hashOffset += windowSize
+		if s.hashOffset > maxHashOffset {
+			delta := s.hashOffset - 1
+			s.hashOffset -= delta
+			s.chainHead -= delta
 			// Iterate over slices instead of arrays to avoid copying
 			// the entire table onto the stack (Issue #18625).
-			for i, v := range d.hashPrev[:] {
-				if int(v) > delta {
-					d.hashPrev[i] = uint32(int(v) - delta)
-				} else {
-					d.hashPrev[i] = 0
-				}
+			for i, v := range s.hashPrev[:] {
+				s.hashPrev[i] = uint32(max(int(v)-delta, 0))
 			}
-			for i, v := range d.hashHead[:] {
-				if int(v) > delta {
-					d.hashHead[i] = uint32(int(v) - delta)
-				} else {
-					d.hashHead[i] = 0
-				}
+			for i, v := range s.hashHead[:] {
+				s.hashHead[i] = uint32(max(int(v)-delta, 0))
 			}
 		}
 	}
@@ -161,14 +150,38 @@
 	return n
 }
 
-func (d *compressor) writeBlock(tokens []token, index int) error {
-	if index > 0 {
+func (d *compressor) writeBlock(tok *tokens, index int, eof bool) error {
+	if index > 0 || eof {
 		var window []byte
 		if d.blockStart <= index {
 			window = d.window[d.blockStart:index]
 		}
 		d.blockStart = index
-		d.w.writeBlock(tokens, false, window)
+		d.w.writeBlockDynamic(tok, eof, window, d.sync)
+		return d.w.err
+	}
+	return nil
+}
+
+// writeBlockSkip writes the current block and uses the number of tokens
+// to determine if the block should be stored on no matches, or
+// only huffman encoded.
+func (d *compressor) writeBlockSkip(tok *tokens, index int, eof bool) error {
+	if index > 0 || eof {
+		if d.blockStart <= index {
+			window := d.window[d.blockStart:index]
+			// If we removed less than a 64th of all literals
+			// we huffman compress the block.
+			if int(tok.n) > len(window)-int(tok.n>>6) {
+				d.w.writeBlockHuff(eof, window, d.sync)
+			} else {
+				// Write a dynamic huffman block.
+				d.w.writeBlockDynamic(tok, eof, window, d.sync)
+			}
+		} else {
+			d.w.writeBlock(tok, eof, nil)
+		}
+		d.blockStart = index
 		return d.w.err
 	}
 	return nil
@@ -177,103 +190,139 @@
 // fillWindow will fill the current window with the supplied
 // dictionary and calculate all hashes.
 // This is much faster than doing a full encode.
-// Should only be used after a reset.
+// Should only be used after a start/reset.
 func (d *compressor) fillWindow(b []byte) {
-	// Do not fill window if we are in store-only mode.
-	if d.compressionLevel.level < 2 {
+	// Do not fill window if we are in store-only or huffman mode.
+	if d.level <= 0 {
 		return
 	}
-	if d.index != 0 || d.windowEnd != 0 {
-		panic("internal error: fillWindow called with stale data")
+	if d.fast != nil {
+		// encode the last data, but discard the result
+		if len(b) > maxMatchOffset {
+			b = b[len(b)-maxMatchOffset:]
+		}
+		d.fast.Encode(&d.tokens, b)
+		d.tokens.Reset()
+		return
 	}
-
+	s := d.state
 	// If we are given too much, cut it.
 	if len(b) > windowSize {
 		b = b[len(b)-windowSize:]
 	}
 	// Add all to window.
-	n := copy(d.window, b)
+	n := copy(d.window[d.windowEnd:], b)
 
 	// Calculate 256 hashes at the time (more L1 cache hits)
 	loops := (n + 256 - minMatchLength) / 256
-	for j := 0; j < loops; j++ {
-		index := j * 256
-		end := index + 256 + minMatchLength - 1
-		if end > n {
-			end = n
-		}
-		toCheck := d.window[index:end]
-		dstSize := len(toCheck) - minMatchLength + 1
+	for j := range loops {
+		startindex := j * 256
+		end := min(startindex+256+minMatchLength-1, n)
+		tocheck := d.window[startindex:end]
+		dstSize := len(tocheck) - minMatchLength + 1
 
 		if dstSize <= 0 {
 			continue
 		}
 
-		dst := d.hashMatch[:dstSize]
-		d.bulkHasher(toCheck, dst)
+		dst := s.hashMatch[:dstSize]
+		bulkHash4(tocheck, dst)
+		var newH uint32
 		for i, val := range dst {
-			di := i + index
-			hh := &d.hashHead[val&hashMask]
+			di := i + startindex
+			newH = val & hashMask
 			// Get previous value with the same hash.
 			// Our chain should point to the previous value.
-			d.hashPrev[di&windowMask] = *hh
+			s.hashPrev[di&windowMask] = s.hashHead[newH]
 			// Set the head of the hash chain to us.
-			*hh = uint32(di + d.hashOffset)
+			s.hashHead[newH] = uint32(di + s.hashOffset)
 		}
 	}
 	// Update window information.
-	d.windowEnd = n
-	d.index = n
+	d.windowEnd += n
+	s.index = n
 }
 
 // Try to find a match starting at index whose length is greater than prevSize.
 // We only look at chainCount possibilities before giving up.
-func (d *compressor) findMatch(pos int, prevHead int, prevLength int, lookahead int) (length, offset int, ok bool) {
-	minMatchLook := maxMatchLength
-	if lookahead < minMatchLook {
-		minMatchLook = lookahead
-	}
+func (d *compressor) findMatch(pos int, prevHead int, lookahead int) (length, offset int, ok bool) {
+	minMatchLook := min(lookahead, maxMatchLength)
 
 	win := d.window[0 : pos+minMatchLook]
 
 	// We quit when we get a match that's at least nice long
-	nice := len(win) - pos
-	if d.nice < nice {
-		nice = d.nice
-	}
+	nice := min(d.nice, len(win)-pos)
 
 	// If we've got a match that's good enough, only look in 1/4 the chain.
 	tries := d.chain
-	length = prevLength
-	if length >= d.good {
-		tries >>= 2
-	}
+	length = minMatchLength - 1
 
 	wEnd := win[pos+length]
 	wPos := win[pos:]
-	minIndex := pos - windowSize
+	minIndex := max(pos-windowSize, 0)
+	offset = 0
+
+	if d.chain < 100 {
+		for i := prevHead; tries > 0; tries-- {
+			if wEnd == win[i+length] {
+				n := matchLen(win[i:i+minMatchLook], wPos)
+				if n > length {
+					length = n
+					offset = pos - i
+					ok = true
+					if n >= nice {
+						// The match is good enough that we don't try to find a better one.
+						break
+					}
+					wEnd = win[pos+n]
+				}
+			}
+			if i <= minIndex {
+				// hashPrev[i & windowMask] has already been overwritten, so stop now.
+				break
+			}
+			i = int(d.state.hashPrev[i&windowMask]) - d.state.hashOffset
+			if i < minIndex {
+				break
+			}
+		}
+		return
+	}
+
+	// Minimum gain to accept a match.
+	cGain := 4
+
+	// Some like it higher (CSV), some like it lower (JSON)
+	const baseCost = 3
+	// Base is 4 bytes at with an additional cost.
+	// Matches must be better than this.
 
 	for i := prevHead; tries > 0; tries-- {
 		if wEnd == win[i+length] {
-			n := matchLen(win[i:], wPos, minMatchLook)
+			n := matchLen(win[i:i+minMatchLook], wPos)
+			if n > length {
+				// Calculate gain. Estimates the gains of the new match compared to emitting as literals.
+				newGain := d.h.bitLengthRaw(wPos[:n]) - int(offsetExtraBits[offsetCode(uint32(pos-i))]) - baseCost - int(lengthExtraBits[lengthCodes[(n-3)&255]])
 
-			if n > length && (n > minMatchLength || pos-i <= 4096) {
-				length = n
-				offset = pos - i
-				ok = true
-				if n >= nice {
-					// The match is good enough that we don't try to find a better one.
-					break
+				if newGain > cGain {
+					length = n
+					offset = pos - i
+					cGain = newGain
+					ok = true
+					if n >= nice {
+						// The match is good enough that we don't try to find a better one.
+						break
+					}
+					wEnd = win[pos+n]
 				}
-				wEnd = win[pos+n]
 			}
 		}
-		if i == minIndex {
+		if i <= minIndex {
 			// hashPrev[i & windowMask] has already been overwritten, so stop now.
 			break
 		}
-		i = int(d.hashPrev[i&windowMask]) - d.hashOffset
-		if i < minIndex || i < 0 {
+		i = int(d.state.hashPrev[i&windowMask]) - d.state.hashOffset
+		if i < minIndex {
 			break
 		}
 	}
@@ -288,235 +337,272 @@
 	return d.w.err
 }
 
-const hashmul = 0x1e35a7bd
-
 // hash4 returns a hash representation of the first 4 bytes
 // of the supplied slice.
 // The caller must ensure that len(b) >= 4.
 func hash4(b []byte) uint32 {
-	return ((uint32(b[3]) | uint32(b[2])<<8 | uint32(b[1])<<16 | uint32(b[0])<<24) * hashmul) >> (32 - hashBits)
+	return hash4u(loadLE32(b, 0), hashBits)
+}
+
+// hash4 returns the hash of u to fit in a hash table with h bits.
+// Preferably h should be a constant and should always be <32.
+func hash4u(u uint32, h uint8) uint32 {
+	return (u * prime4bytes) >> (32 - h)
 }
 
 // bulkHash4 will compute hashes using the same
-// algorithm as hash4.
+// algorithm as hash4
 func bulkHash4(b []byte, dst []uint32) {
-	if len(b) < minMatchLength {
+	if len(b) < 4 {
 		return
 	}
-	hb := uint32(b[3]) | uint32(b[2])<<8 | uint32(b[1])<<16 | uint32(b[0])<<24
-	dst[0] = (hb * hashmul) >> (32 - hashBits)
-	end := len(b) - minMatchLength + 1
+	hb := loadLE32(b, 0)
+
+	dst[0] = hash4u(hb, hashBits)
+	end := len(b) - 4 + 1
 	for i := 1; i < end; i++ {
-		hb = (hb << 8) | uint32(b[i+3])
-		dst[i] = (hb * hashmul) >> (32 - hashBits)
+		hb = (hb >> 8) | uint32(b[i+3])<<24
+		dst[i] = hash4u(hb, hashBits)
 	}
 }
 
-// matchLen returns the number of matching bytes in a and b
-// up to length 'max'. Both slices must be at least 'max'
-// bytes in size.
-func matchLen(a, b []byte, max int) int {
-	a = a[:max]
-	b = b[:len(a)]
-	for i, av := range a {
-		if b[i] != av {
-			return i
-		}
-	}
-	return max
-}
-
-// encSpeed will compress and store the currently added data,
-// if enough has been accumulated or we at the end of the stream.
-// Any error that occurred will be in d.err
-func (d *compressor) encSpeed() {
-	// We only compress if we have maxStoreBlockSize.
-	if d.windowEnd < maxStoreBlockSize {
-		if !d.sync {
-			return
-		}
-
-		// Handle small sizes.
-		if d.windowEnd < 128 {
-			switch {
-			case d.windowEnd == 0:
-				return
-			case d.windowEnd <= 16:
-				d.err = d.writeStoredBlock(d.window[:d.windowEnd])
-			default:
-				d.w.writeBlockHuff(false, d.window[:d.windowEnd])
-				d.err = d.w.err
-			}
-			d.windowEnd = 0
-			d.bestSpeed.reset()
-			return
-		}
-
-	}
-	// Encode the block.
-	d.tokens = d.bestSpeed.encode(d.tokens[:0], d.window[:d.windowEnd])
-
-	// If we removed less than 1/16th, Huffman compress the block.
-	if len(d.tokens) > d.windowEnd-(d.windowEnd>>4) {
-		d.w.writeBlockHuff(false, d.window[:d.windowEnd])
-	} else {
-		d.w.writeBlockDynamic(d.tokens, false, d.window[:d.windowEnd])
-	}
-	d.err = d.w.err
-	d.windowEnd = 0
-}
-
 func (d *compressor) initDeflate() {
 	d.window = make([]byte, 2*windowSize)
-	d.hashOffset = 1
-	d.tokens = make([]token, 0, maxFlateBlockTokens+1)
-	d.length = minMatchLength - 1
-	d.offset = 0
 	d.byteAvailable = false
-	d.index = 0
-	d.chainHead = -1
-	d.bulkHasher = bulkHash4
-}
-
-func (d *compressor) deflate() {
-	if d.windowEnd-d.index < minMatchLength+maxMatchLength && !d.sync {
+	d.err = nil
+	if d.state == nil {
 		return
 	}
+	s := d.state
+	s.index = 0
+	s.hashOffset = 1
+	s.length = minMatchLength - 1
+	s.offset = 0
+	s.chainHead = -1
+}
 
-	d.maxInsertIndex = d.windowEnd - (minMatchLength - 1)
+// deflateLazy does encoding with lazy matching.
+func (d *compressor) deflateLazy() {
+	s := d.state
 
-Loop:
-	for {
-		if d.index > d.windowEnd {
-			panic("index > windowEnd")
+	if d.windowEnd-s.index < minMatchLength+maxMatchLength && !d.sync {
+		return
+	}
+	if d.windowEnd != s.index && d.chain > 100 {
+		// Get literal huffman coder.
+		// This is used to estimate the cost of emitting a literal.
+		if d.h == nil {
+			d.h = newHuffmanEncoder(maxFlateBlockTokens)
 		}
-		lookahead := d.windowEnd - d.index
+		var tmp [256]uint16
+		for _, v := range d.window[s.index:d.windowEnd] {
+			tmp[v]++
+		}
+		d.h.generate(tmp[:], 15)
+	}
+
+	s.maxInsertIndex = d.windowEnd - (minMatchLength - 1)
+
+	for {
+		lookahead := d.windowEnd - s.index
 		if lookahead < minMatchLength+maxMatchLength {
 			if !d.sync {
-				break Loop
-			}
-			if d.index > d.windowEnd {
-				panic("index > windowEnd")
+				return
 			}
 			if lookahead == 0 {
 				// Flush current output block if any.
 				if d.byteAvailable {
 					// There is still one pending token that needs to be flushed
-					d.tokens = append(d.tokens, literalToken(uint32(d.window[d.index-1])))
+					d.tokens.AddLiteral(d.window[s.index-1])
 					d.byteAvailable = false
 				}
-				if len(d.tokens) > 0 {
-					if d.err = d.writeBlock(d.tokens, d.index); d.err != nil {
+				if d.tokens.n > 0 {
+					if d.err = d.writeBlock(&d.tokens, s.index, false); d.err != nil {
 						return
 					}
-					d.tokens = d.tokens[:0]
+					d.tokens.Reset()
 				}
-				break Loop
+				return
 			}
 		}
-		if d.index < d.maxInsertIndex {
+		if s.index < s.maxInsertIndex {
 			// Update the hash
-			hash := hash4(d.window[d.index : d.index+minMatchLength])
-			hh := &d.hashHead[hash&hashMask]
-			d.chainHead = int(*hh)
-			d.hashPrev[d.index&windowMask] = uint32(d.chainHead)
-			*hh = uint32(d.index + d.hashOffset)
+			hash := hash4(d.window[s.index:])
+			ch := s.hashHead[hash]
+			s.chainHead = int(ch)
+			s.hashPrev[s.index&windowMask] = ch
+			s.hashHead[hash] = uint32(s.index + s.hashOffset)
 		}
-		prevLength := d.length
-		prevOffset := d.offset
-		d.length = minMatchLength - 1
-		d.offset = 0
-		minIndex := d.index - windowSize
-		if minIndex < 0 {
-			minIndex = 0
+		prevLength := s.length
+		prevOffset := s.offset
+		s.length = minMatchLength - 1
+		s.offset = 0
+		minIndex := max(s.index-windowSize, 0)
+
+		if s.chainHead-s.hashOffset >= minIndex && lookahead > prevLength && prevLength < d.lazy {
+			if newLength, newOffset, ok := d.findMatch(s.index, s.chainHead-s.hashOffset, lookahead); ok {
+				s.length = newLength
+				s.offset = newOffset
+			}
 		}
 
-		if d.chainHead-d.hashOffset >= minIndex &&
-			(d.fastSkipHashing != skipNever && lookahead > minMatchLength-1 ||
-				d.fastSkipHashing == skipNever && lookahead > prevLength && prevLength < d.lazy) {
-			if newLength, newOffset, ok := d.findMatch(d.index, d.chainHead-d.hashOffset, minMatchLength-1, lookahead); ok {
-				d.length = newLength
-				d.offset = newOffset
+		if prevLength >= minMatchLength && s.length <= prevLength {
+			// No better match, but check for better match at end...
+			//
+			// Skip forward a number of bytes.
+			// Offset of 2 seems to yield the best results. 3 is sometimes better.
+			const checkOff = 2
+
+			// Check all, except full length
+			if prevLength < maxMatchLength-checkOff {
+				prevIndex := s.index - 1
+				if prevIndex+prevLength < s.maxInsertIndex {
+					end := min(lookahead, maxMatchLength+checkOff)
+					end += prevIndex
+
+					// Hash at match end.
+					h := hash4(d.window[prevIndex+prevLength:])
+					ch2 := int(s.hashHead[h]) - s.hashOffset - prevLength
+					if prevIndex-ch2 != prevOffset && ch2 > minIndex+checkOff {
+						length := matchLen(d.window[prevIndex+checkOff:end], d.window[ch2+checkOff:])
+						// It seems like a pure length metric is best.
+						if length > prevLength {
+							prevLength = length
+							prevOffset = prevIndex - ch2
+
+							// Extend back...
+							for i := checkOff - 1; i >= 0; i-- {
+								if prevLength >= maxMatchLength || d.window[prevIndex+i] != d.window[ch2+i] {
+									// Emit tokens we "owe"
+									for j := 0; j <= i; j++ {
+										d.tokens.AddLiteral(d.window[prevIndex+j])
+										if d.tokens.n == maxFlateBlockTokens {
+											// The block includes the current character
+											if d.err = d.writeBlock(&d.tokens, s.index, false); d.err != nil {
+												return
+											}
+											d.tokens.Reset()
+										}
+										s.index++
+										if s.index < s.maxInsertIndex {
+											h := hash4(d.window[s.index:])
+											ch := s.hashHead[h]
+											s.chainHead = int(ch)
+											s.hashPrev[s.index&windowMask] = ch
+											s.hashHead[h] = uint32(s.index + s.hashOffset)
+										}
+									}
+									break
+								} else {
+									prevLength++
+								}
+							}
+						}
+					}
+				}
 			}
-		}
-		if d.fastSkipHashing != skipNever && d.length >= minMatchLength ||
-			d.fastSkipHashing == skipNever && prevLength >= minMatchLength && d.length <= prevLength {
 			// There was a match at the previous step, and the current match is
 			// not better. Output the previous match.
-			if d.fastSkipHashing != skipNever {
-				d.tokens = append(d.tokens, matchToken(uint32(d.length-baseMatchLength), uint32(d.offset-baseMatchOffset)))
-			} else {
-				d.tokens = append(d.tokens, matchToken(uint32(prevLength-baseMatchLength), uint32(prevOffset-baseMatchOffset)))
-			}
+			d.tokens.AddMatch(uint32(prevLength-3), uint32(prevOffset-minOffsetSize))
+
 			// Insert in the hash table all strings up to the end of the match.
 			// index and index-1 are already inserted. If there is not enough
 			// lookahead, the last two strings are not inserted into the hash
 			// table.
-			if d.length <= d.fastSkipHashing {
-				var newIndex int
-				if d.fastSkipHashing != skipNever {
-					newIndex = d.index + d.length
-				} else {
-					newIndex = d.index + prevLength - 1
+			newIndex := s.index + prevLength - 1
+			// Calculate missing hashes
+			end := min(newIndex, s.maxInsertIndex)
+			end += minMatchLength - 1
+			startindex := min(s.index+1, s.maxInsertIndex)
+			tocheck := d.window[startindex:end]
+			dstSize := len(tocheck) - minMatchLength + 1
+			if dstSize > 0 {
+				dst := s.hashMatch[:dstSize]
+				bulkHash4(tocheck, dst)
+				var newH uint32
+				for i, val := range dst {
+					di := i + startindex
+					newH = val & hashMask
+					// Get previous value with the same hash.
+					// Our chain should point to the previous value.
+					s.hashPrev[di&windowMask] = s.hashHead[newH]
+					// Set the head of the hash chain to us.
+					s.hashHead[newH] = uint32(di + s.hashOffset)
 				}
-				index := d.index
-				for index++; index < newIndex; index++ {
-					if index < d.maxInsertIndex {
-						hash := hash4(d.window[index : index+minMatchLength])
-						// Get previous value with the same hash.
-						// Our chain should point to the previous value.
-						hh := &d.hashHead[hash&hashMask]
-						d.hashPrev[index&windowMask] = *hh
-						// Set the head of the hash chain to us.
-						*hh = uint32(index + d.hashOffset)
-					}
-				}
-				d.index = index
-
-				if d.fastSkipHashing == skipNever {
-					d.byteAvailable = false
-					d.length = minMatchLength - 1
-				}
-			} else {
-				// For matches this long, we don't bother inserting each individual
-				// item into the table.
-				d.index += d.length
 			}
-			if len(d.tokens) == maxFlateBlockTokens {
+
+			s.index = newIndex
+			d.byteAvailable = false
+			s.length = minMatchLength - 1
+			if d.tokens.n == maxFlateBlockTokens {
 				// The block includes the current character
-				if d.err = d.writeBlock(d.tokens, d.index); d.err != nil {
+				if d.err = d.writeBlock(&d.tokens, s.index, false); d.err != nil {
 					return
 				}
-				d.tokens = d.tokens[:0]
+				d.tokens.Reset()
 			}
+			s.ii = 0
 		} else {
-			if d.fastSkipHashing != skipNever || d.byteAvailable {
-				i := d.index - 1
-				if d.fastSkipHashing != skipNever {
-					i = d.index
-				}
-				d.tokens = append(d.tokens, literalToken(uint32(d.window[i])))
-				if len(d.tokens) == maxFlateBlockTokens {
-					if d.err = d.writeBlock(d.tokens, i+1); d.err != nil {
+			// Reset, if we got a match this run.
+			if s.length >= minMatchLength {
+				s.ii = 0
+			}
+			// We have a byte waiting. Emit it.
+			if d.byteAvailable {
+				s.ii++
+				d.tokens.AddLiteral(d.window[s.index-1])
+				if d.tokens.n == maxFlateBlockTokens {
+					if d.err = d.writeBlock(&d.tokens, s.index, false); d.err != nil {
 						return
 					}
-					d.tokens = d.tokens[:0]
+					d.tokens.Reset()
 				}
-			}
-			d.index++
-			if d.fastSkipHashing == skipNever {
+				s.index++
+
+				// If we have a long run of no matches, skip additional bytes
+				// Resets when s.ii overflows after 64KB.
+				if n := int(s.ii) - d.chain; n > 0 {
+					n = 1 + int(n>>6)
+					for j := 0; j < n; j++ {
+						if s.index >= d.windowEnd-1 {
+							break
+						}
+						d.tokens.AddLiteral(d.window[s.index-1])
+						if d.tokens.n == maxFlateBlockTokens {
+							if d.err = d.writeBlock(&d.tokens, s.index, false); d.err != nil {
+								return
+							}
+							d.tokens.Reset()
+						}
+						// Index...
+						if s.index < s.maxInsertIndex {
+							h := hash4(d.window[s.index:])
+							ch := s.hashHead[h]
+							s.chainHead = int(ch)
+							s.hashPrev[s.index&windowMask] = ch
+							s.hashHead[h] = uint32(s.index + s.hashOffset)
+						}
+						s.index++
+					}
+					// Flush last byte
+					d.tokens.AddLiteral(d.window[s.index-1])
+					d.byteAvailable = false
+					// s.length = minMatchLength - 1 // not needed, since s.ii is reset above, so it should never be > minMatchLength
+					if d.tokens.n == maxFlateBlockTokens {
+						if d.err = d.writeBlock(&d.tokens, s.index, false); d.err != nil {
+							return
+						}
+						d.tokens.Reset()
+					}
+				}
+			} else {
+				s.index++
 				d.byteAvailable = true
 			}
 		}
 	}
 }
 
-func (d *compressor) fillStore(b []byte) int {
-	n := copy(d.window[d.windowEnd:], b)
-	d.windowEnd += n
-	return n
-}
-
 func (d *compressor) store() {
 	if d.windowEnd > 0 && (d.windowEnd == maxStoreBlockSize || d.sync) {
 		d.err = d.writeStoredBlock(d.window[:d.windowEnd])
@@ -524,38 +610,93 @@
 	}
 }
 
-// storeHuff compresses and stores the currently added data
-// when the d.window is full or we are at the end of the stream.
+// fillWindow will fill the buffer with data for huffman-only compression.
+// The number of bytes copied is returned.
+func (d *compressor) fillBlock(b []byte) int {
+	n := copy(d.window[d.windowEnd:], b)
+	d.windowEnd += n
+	return n
+}
+
+// storeHuff will compress and store the currently added data,
+// if enough has been accumulated or we at the end of the stream.
 // Any error that occurred will be in d.err
 func (d *compressor) storeHuff() {
 	if d.windowEnd < len(d.window) && !d.sync || d.windowEnd == 0 {
 		return
 	}
-	d.w.writeBlockHuff(false, d.window[:d.windowEnd])
+	d.w.writeBlockHuff(false, d.window[:d.windowEnd], d.sync)
 	d.err = d.w.err
 	d.windowEnd = 0
 }
 
+// storeFast will compress and store the currently added data,
+// if enough has been accumulated or we at the end of the stream.
+// Any error that occurred will be in d.err
+func (d *compressor) storeFast() {
+	// We only compress if we have maxStoreBlockSize.
+	if d.windowEnd < len(d.window) {
+		if !d.sync {
+			return
+		}
+		// Handle extremely small sizes.
+		if d.windowEnd < 128 {
+			if d.windowEnd == 0 {
+				return
+			}
+			if d.windowEnd <= 32 {
+				d.err = d.writeStoredBlock(d.window[:d.windowEnd])
+			} else {
+				d.w.writeBlockHuff(false, d.window[:d.windowEnd], true)
+				d.err = d.w.err
+			}
+			d.tokens.Reset()
+			d.windowEnd = 0
+			d.fast.Reset()
+			return
+		}
+	}
+
+	d.fast.Encode(&d.tokens, d.window[:d.windowEnd])
+	// If we made zero matches, store the block as is.
+	if d.tokens.n == 0 {
+		d.err = d.writeStoredBlock(d.window[:d.windowEnd])
+		// If we removed less than 1/16th, huffman compress the block.
+	} else if int(d.tokens.n) > d.windowEnd-(d.windowEnd>>4) {
+		d.w.writeBlockHuff(false, d.window[:d.windowEnd], d.sync)
+		d.err = d.w.err
+	} else {
+		d.w.writeBlockDynamic(&d.tokens, false, d.window[:d.windowEnd], d.sync)
+		d.err = d.w.err
+	}
+	d.tokens.Reset()
+	d.windowEnd = 0
+}
+
+// write will add input byte to the stream.
+// Unless an error occurs all bytes will be consumed.
 func (d *compressor) write(b []byte) (n int, err error) {
 	if d.err != nil {
 		return 0, d.err
 	}
 	n = len(b)
 	for len(b) > 0 {
-		d.step(d)
+		if d.windowEnd == len(d.window) || d.sync {
+			d.step(d)
+		}
 		b = b[d.fill(d, b):]
 		if d.err != nil {
 			return 0, d.err
 		}
 	}
-	return n, nil
+	return n, d.err
 }
 
 func (d *compressor) syncFlush() error {
+	d.sync = true
 	if d.err != nil {
 		return d.err
 	}
-	d.sync = true
 	d.step(d)
 	if d.err == nil {
 		d.w.writeStoredHeader(0, false)
@@ -572,30 +713,33 @@
 	switch {
 	case level == NoCompression:
 		d.window = make([]byte, maxStoreBlockSize)
-		d.fill = (*compressor).fillStore
+		d.fill = (*compressor).fillBlock
 		d.step = (*compressor).store
 	case level == HuffmanOnly:
-		d.window = make([]byte, maxStoreBlockSize)
-		d.fill = (*compressor).fillStore
+		d.w.logNewTablePenalty = 10
+		d.window = make([]byte, 32<<10)
+		d.fill = (*compressor).fillBlock
 		d.step = (*compressor).storeHuff
-	case level == BestSpeed:
-		d.compressionLevel = levels[level]
-		d.window = make([]byte, maxStoreBlockSize)
-		d.fill = (*compressor).fillStore
-		d.step = (*compressor).encSpeed
-		d.bestSpeed = newDeflateFast()
-		d.tokens = make([]token, maxStoreBlockSize)
 	case level == DefaultCompression:
 		level = 6
 		fallthrough
-	case 2 <= level && level <= 9:
+	case level >= 1 && level <= 6:
+		d.w.logNewTablePenalty = 7
+		d.fast = newFastEnc(level)
+		d.window = make([]byte, maxStoreBlockSize)
+		d.fill = (*compressor).fillBlock
+		d.step = (*compressor).storeFast
+	case 7 <= level && level <= 9:
+		d.w.logNewTablePenalty = 8
+		d.state = &advancedState{}
 		d.compressionLevel = levels[level]
 		d.initDeflate()
 		d.fill = (*compressor).fillDeflate
-		d.step = (*compressor).deflate
+		d.step = (*compressor).deflateLazy
 	default:
 		return fmt.Errorf("flate: invalid compression level %d: want value in range [-2, 9]", level)
 	}
+	d.level = level
 	return nil
 }
 
@@ -603,27 +747,39 @@
 	d.w.reset(w)
 	d.sync = false
 	d.err = nil
-	switch d.compressionLevel.level {
-	case NoCompression:
+	// We only need to reset a few things for Snappy.
+	if d.fast != nil {
+		d.fast.Reset()
 		d.windowEnd = 0
-	case BestSpeed:
+		d.tokens.Reset()
+		return
+	}
+	switch d.compressionLevel.chain {
+	case 0:
+		// level was NoCompression or ConstantCompression.
 		d.windowEnd = 0
-		d.tokens = d.tokens[:0]
-		d.bestSpeed.reset()
 	default:
-		d.chainHead = -1
-		clear(d.hashHead[:])
-		clear(d.hashPrev[:])
-		d.hashOffset = 1
-		d.index, d.windowEnd = 0, 0
+		s := d.state
+		s.chainHead = -1
+		for i := range s.hashHead {
+			s.hashHead[i] = 0
+		}
+		for i := range s.hashPrev {
+			s.hashPrev[i] = 0
+		}
+		s.hashOffset = 1
+		s.index, d.windowEnd = 0, 0
 		d.blockStart, d.byteAvailable = 0, false
-		d.tokens = d.tokens[:0]
-		d.length = minMatchLength - 1
-		d.offset = 0
-		d.maxInsertIndex = 0
+		d.tokens.Reset()
+		s.length = minMatchLength - 1
+		s.offset = 0
+		s.ii = 0
+		s.maxInsertIndex = 0
 	}
 }
 
+var errWriterClosed = errors.New("flate: closed writer")
+
 func (d *compressor) close() error {
 	if d.err == errWriterClosed {
 		return nil
@@ -644,6 +800,7 @@
 		return d.w.err
 	}
 	d.err = errWriterClosed
+	d.w.reset(nil)
 	return nil
 }
 
@@ -674,26 +831,15 @@
 // can only be decompressed by a reader initialized with the
 // same dictionary (see [NewReaderDict]).
 func NewWriterDict(w io.Writer, level int, dict []byte) (*Writer, error) {
-	dw := &dictWriter{w}
-	zw, err := NewWriter(dw, level)
+	zw, err := NewWriter(w, level)
 	if err != nil {
 		return nil, err
 	}
 	zw.d.fillWindow(dict)
 	zw.dict = append(zw.dict, dict...) // duplicate dictionary for Reset method.
-	return zw, nil
+	return zw, err
 }
 
-type dictWriter struct {
-	w io.Writer
-}
-
-func (w *dictWriter) Write(b []byte) (n int, err error) {
-	return w.w.Write(b)
-}
-
-var errWriterClosed = errors.New("flate: closed writer")
-
 // A Writer takes data written to it and writes the compressed
 // form of that data to an underlying writer (see [NewWriter]).
 type Writer struct {
@@ -728,16 +874,26 @@
 }
 
 // Reset discards the writer's state and makes it equivalent to
-// the result of [NewWriter] or [NewWriterDict] called with dst
+// the result of NewWriter or NewWriterDict called with dst
 // and w's level and dictionary.
 func (w *Writer) Reset(dst io.Writer) {
-	if dw, ok := w.d.w.writer.(*dictWriter); ok {
+	if len(w.dict) > 0 {
 		// w was created with NewWriterDict
-		dw.w = dst
-		w.d.reset(dw)
-		w.d.fillWindow(w.dict)
+		w.d.reset(dst)
+		if dst != nil {
+			w.d.fillWindow(w.dict)
+		}
 	} else {
 		// w was created with NewWriter
 		w.d.reset(dst)
 	}
 }
+
+// ResetDict discards the writer's state and makes it equivalent to
+// the result of NewWriter or NewWriterDict called with dst
+// and w's level, but sets a specific dictionary.
+func (w *Writer) ResetDict(dst io.Writer, dict []byte) {
+	w.dict = dict
+	w.d.reset(dst)
+	w.d.fillWindow(w.dict)
+}
diff --git a/src/compress/flate/deflate_test.go b/src/compress/flate/deflate_test.go
index 3610c7b..4bb89c6 100644
--- a/src/compress/flate/deflate_test.go
+++ b/src/compress/flate/deflate_test.go
@@ -6,14 +6,11 @@
 
 import (
 	"bytes"
-	"errors"
 	"fmt"
-	"internal/testenv"
 	"io"
-	"math/rand"
 	"os"
 	"reflect"
-	"runtime/debug"
+	"strings"
 	"sync"
 	"testing"
 )
@@ -35,24 +32,24 @@
 }
 
 var deflateTests = []*deflateTest{
-	{[]byte{}, 0, []byte{1, 0, 0, 255, 255}},
-	{[]byte{0x11}, -1, []byte{18, 4, 4, 0, 0, 255, 255}},
-	{[]byte{0x11}, DefaultCompression, []byte{18, 4, 4, 0, 0, 255, 255}},
-	{[]byte{0x11}, 4, []byte{18, 4, 4, 0, 0, 255, 255}},
+	0: {[]byte{}, 0, []byte{0x3, 0x0}},
+	1: {[]byte{0x11}, BestCompression, []byte{0x12, 0x4, 0xc, 0x0}},
+	2: {[]byte{0x11}, BestCompression, []byte{0x12, 0x4, 0xc, 0x0}},
+	3: {[]byte{0x11}, BestCompression, []byte{0x12, 0x4, 0xc, 0x0}},
 
-	{[]byte{0x11}, 0, []byte{0, 1, 0, 254, 255, 17, 1, 0, 0, 255, 255}},
-	{[]byte{0x11, 0x12}, 0, []byte{0, 2, 0, 253, 255, 17, 18, 1, 0, 0, 255, 255}},
-	{[]byte{0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11}, 0,
-		[]byte{0, 8, 0, 247, 255, 17, 17, 17, 17, 17, 17, 17, 17, 1, 0, 0, 255, 255},
+	4: {[]byte{0x11}, 0, []byte{0x0, 0x1, 0x0, 0xfe, 0xff, 0x11, 0x3, 0x0}},
+	5: {[]byte{0x11, 0x12}, 0, []byte{0x0, 0x2, 0x0, 0xfd, 0xff, 0x11, 0x12, 0x3, 0x0}},
+	6: {[]byte{0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11}, 0,
+		[]byte{0x0, 0x8, 0x0, 0xf7, 0xff, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x3, 0x0},
 	},
-	{[]byte{}, 2, []byte{1, 0, 0, 255, 255}},
-	{[]byte{0x11}, 2, []byte{18, 4, 4, 0, 0, 255, 255}},
-	{[]byte{0x11, 0x12}, 2, []byte{18, 20, 2, 4, 0, 0, 255, 255}},
-	{[]byte{0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11}, 2, []byte{18, 132, 2, 64, 0, 0, 0, 255, 255}},
-	{[]byte{}, 9, []byte{1, 0, 0, 255, 255}},
-	{[]byte{0x11}, 9, []byte{18, 4, 4, 0, 0, 255, 255}},
-	{[]byte{0x11, 0x12}, 9, []byte{18, 20, 2, 4, 0, 0, 255, 255}},
-	{[]byte{0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11}, 9, []byte{18, 132, 2, 64, 0, 0, 0, 255, 255}},
+	7:  {[]byte{}, 1, []byte{0x3, 0x0}},
+	8:  {[]byte{0x11}, BestCompression, []byte{0x12, 0x4, 0xc, 0x0}},
+	9:  {[]byte{0x11, 0x12}, BestCompression, []byte{0x12, 0x14, 0x2, 0xc, 0x0}},
+	10: {[]byte{0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11}, BestCompression, []byte{0x12, 0x84, 0x1, 0xc0, 0x0}},
+	11: {[]byte{}, 9, []byte{0x3, 0x0}},
+	12: {[]byte{0x11}, 9, []byte{0x12, 0x4, 0xc, 0x0}},
+	13: {[]byte{0x11, 0x12}, 9, []byte{0x12, 0x14, 0x2, 0xc, 0x0}},
+	14: {[]byte{0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11}, 9, []byte{0x12, 0x84, 0x1, 0xc0, 0x0}},
 }
 
 var deflateInflateTests = []*deflateInflateTest{
@@ -86,23 +83,24 @@
 func TestBulkHash4(t *testing.T) {
 	for _, x := range deflateTests {
 		y := x.out
-		if len(y) < minMatchLength {
-			continue
-		}
-		y = append(y, y...)
-		for j := 4; j < len(y); j++ {
-			y := y[:j]
-			dst := make([]uint32, len(y)-minMatchLength+1)
-			for i := range dst {
-				dst[i] = uint32(i + 100)
-			}
-			bulkHash4(y, dst)
-			for i, got := range dst {
-				want := hash4(y[i:])
-				if got != want && got == uint32(i)+100 {
-					t.Errorf("Len:%d Index:%d, want 0x%08x but not modified", len(y), i, want)
-				} else if got != want {
-					t.Errorf("Len:%d Index:%d, got 0x%08x want:0x%08x", len(y), i, got, want)
+		if len(y) >= minMatchLength {
+			y = append(y, y...)
+			for j := 4; j < len(y); j++ {
+				y := y[:j]
+				dst := make([]uint32, len(y)-minMatchLength+1)
+				for i := range dst {
+					dst[i] = uint32(i + 100)
+				}
+				bulkHash4(y, dst)
+				for i, got := range dst {
+					want := hash4(y[i:])
+					if got != want && got == uint32(i)+100 {
+						t.Errorf("Len:%d Index:%d, expected 0x%08x but not modified", len(y), i, want)
+					} else if got != want {
+						t.Errorf("Len:%d Index:%d, got 0x%08x expected:0x%08x", len(y), i, got, want)
+					} else {
+						//t.Logf("Len:%d Index:%d OK (0x%08x)", len(y), i, got)
+					}
 				}
 			}
 		}
@@ -110,7 +108,7 @@
 }
 
 func TestDeflate(t *testing.T) {
-	for _, h := range deflateTests {
+	for i, h := range deflateTests {
 		var buf bytes.Buffer
 		w, err := NewWriter(&buf, h.level)
 		if err != nil {
@@ -120,45 +118,11 @@
 		w.Write(h.in)
 		w.Close()
 		if !bytes.Equal(buf.Bytes(), h.out) {
-			t.Errorf("Deflate(%d, %x) = \n%#v, want \n%#v", h.level, h.in, buf.Bytes(), h.out)
+			t.Errorf("%d: Deflate(%d, %x) got \n%#v, want \n%#v", i, h.level, h.in, buf.Bytes(), h.out)
 		}
 	}
 }
 
-func TestWriterClose(t *testing.T) {
-	b := new(bytes.Buffer)
-	zw, err := NewWriter(b, 6)
-	if err != nil {
-		t.Fatalf("NewWriter: %v", err)
-	}
-
-	if c, err := zw.Write([]byte("Test")); err != nil || c != 4 {
-		t.Fatalf("Write to not closed writer: %s, %d", err, c)
-	}
-
-	if err := zw.Close(); err != nil {
-		t.Fatalf("Close: %v", err)
-	}
-
-	afterClose := b.Len()
-
-	if c, err := zw.Write([]byte("Test")); err == nil || c != 0 {
-		t.Fatalf("Write to closed writer: %v, %d", err, c)
-	}
-
-	if err := zw.Flush(); err == nil {
-		t.Fatalf("Flush to closed writer: %s", err)
-	}
-
-	if err := zw.Close(); err != nil {
-		t.Fatalf("Close: %v", err)
-	}
-
-	if afterClose != b.Len() {
-		t.Fatalf("Writer wrote data after close. After close: %d. After writes on closed stream: %d", afterClose, b.Len())
-	}
-}
-
 // A sparseReader returns a stream consisting of 0s followed by 1<<16 1s.
 // This tests missing hash references in a very large input.
 type sparseReader struct {
@@ -191,7 +155,8 @@
 	if testing.Short() {
 		t.Skip("skipping sparse chunk during short test")
 	}
-	w, err := NewWriter(io.Discard, 1)
+	var buf bytes.Buffer
+	w, err := NewWriter(&buf, 1)
 	if err != nil {
 		t.Errorf("NewWriter: %v", err)
 		return
@@ -200,6 +165,7 @@
 		t.Errorf("Compress failed: %v", err)
 		return
 	}
+	t.Log("Length:", buf.Len())
 }
 
 type syncBuffer struct {
@@ -270,7 +236,7 @@
 	r := NewReader(buf)
 
 	// Write half the input and read back.
-	for i := 0; i < 2; i++ {
+	for i := range 2 {
 		var lo, hi int
 		if i == 0 {
 			lo, hi = 0, (len(input)+1)/2
@@ -348,13 +314,13 @@
 	}
 	w.Write(input)
 	w.Close()
+	if limit > 0 {
+		t.Logf("level: %d - Size:%.2f%%, %d b\n", level, float64(buffer.Len()*100)/float64(limit), buffer.Len())
+	}
 	if limit > 0 && buffer.Len() > limit {
 		t.Errorf("level: %d, len(compress(data)) = %d > limit = %d", level, buffer.Len(), limit)
-		return
 	}
-	if limit > 0 {
-		t.Logf("level: %d, size:%.2f%%, %d b\n", level, float64(buffer.Len()*100)/float64(limit), buffer.Len())
-	}
+
 	r := NewReader(&buffer)
 	out, err := io.ReadAll(r)
 	if err != nil {
@@ -363,6 +329,8 @@
 	}
 	r.Close()
 	if !bytes.Equal(input, out) {
+		os.WriteFile("testdata/fails/"+t.Name()+".got", out, os.ModePerm)
+		os.WriteFile("testdata/fails/"+t.Name()+".want", input, os.ModePerm)
 		t.Errorf("decompress(compress(data)) != data: level=%d input=%s", level, name)
 		return
 	}
@@ -370,19 +338,14 @@
 }
 
 func testToFromWithLimit(t *testing.T, input []byte, name string, limit [11]int) {
-	for i := 0; i < 10; i++ {
+	for i := range 10 {
 		testToFromWithLevelAndLimit(t, i, input, name, limit[i])
 	}
-	// Test HuffmanCompression
 	testToFromWithLevelAndLimit(t, -2, input, name, limit[10])
 }
 
 func TestDeflateInflate(t *testing.T) {
-	t.Parallel()
 	for i, h := range deflateInflateTests {
-		if testing.Short() && len(h.in) > 10000 {
-			continue
-		}
 		testToFromWithLimit(t, h.in, fmt.Sprintf("#%d", i), [11]int{})
 	}
 }
@@ -399,33 +362,38 @@
 type deflateInflateStringTest struct {
 	filename string
 	label    string
-	limit    [11]int
+	limit    [11]int // Number 11 is ConstantCompression
 }
 
 var deflateInflateStringTests = []deflateInflateStringTest{
 	{
 		"../testdata/e.txt",
 		"2.718281828...",
-		[...]int{100018, 50650, 50960, 51150, 50930, 50790, 50790, 50790, 50790, 50790, 43683},
+		[...]int{100018, 67900, 50960, 51150, 50930, 50790, 50790, 50790, 50790, 50790, 43683 + 100},
 	},
 	{
 		"../../testdata/Isaac.Newton-Opticks.txt",
 		"Isaac.Newton-Opticks",
-		[...]int{567248, 218338, 198211, 193152, 181100, 175427, 175427, 173597, 173422, 173422, 325240},
+		[...]int{567248, 218338, 201354, 199101, 190627, 182587, 179765, 174982, 173422, 173422, 325240},
 	},
 }
 
 func TestDeflateInflateString(t *testing.T) {
-	t.Parallel()
-	if testing.Short() && testenv.Builder() == "" {
-		t.Skip("skipping in short mode")
-	}
 	for _, test := range deflateInflateStringTests {
 		gold, err := os.ReadFile(test.filename)
 		if err != nil {
 			t.Error(err)
 		}
-		testToFromWithLimit(t, gold, test.label, test.limit)
+		// Remove returns that may be present on Windows
+		neutral := strings.Map(func(r rune) rune {
+			if r != '\r' {
+				return r
+			}
+			return -1
+		}, string(gold))
+
+		testToFromWithLimit(t, []byte(neutral), test.label, test.limit)
+
 		if testing.Short() {
 			break
 		}
@@ -460,31 +428,36 @@
 
 func TestWriterDict(t *testing.T) {
 	const (
-		dict = "hello world"
-		text = "hello again world"
+		dict = "hello world Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
+		text = "hello world Lorem ipsum dolor sit amet"
 	)
-	var b bytes.Buffer
-	w, err := NewWriter(&b, 5)
-	if err != nil {
-		t.Fatalf("NewWriter: %v", err)
-	}
-	w.Write([]byte(dict))
-	w.Flush()
-	b.Reset()
-	w.Write([]byte(text))
-	w.Close()
+	// This test is sensitive to algorithm changes that skip
+	// data in favour of speed. Higher levels are less prone to this
+	// so we test level 4-9.
+	for l := 4; l < 9; l++ {
+		var b bytes.Buffer
+		w, err := NewWriter(&b, l)
+		if err != nil {
+			t.Fatalf("level %d, NewWriter: %v", l, err)
+		}
+		w.Write([]byte(dict))
+		w.Flush()
+		b.Reset()
+		w.Write([]byte(text))
+		w.Close()
 
-	var b1 bytes.Buffer
-	w, _ = NewWriterDict(&b1, 5, []byte(dict))
-	w.Write([]byte(text))
-	w.Close()
+		var b1 bytes.Buffer
+		w, _ = NewWriterDict(&b1, l, []byte(dict))
+		w.Write([]byte(text))
+		w.Close()
 
-	if !bytes.Equal(b1.Bytes(), b.Bytes()) {
-		t.Fatalf("writer wrote %q want %q", b1.Bytes(), b.Bytes())
+		if !bytes.Equal(b1.Bytes(), b.Bytes()) {
+			t.Errorf("level %d, writer wrote\n%v\n want\n%v", l, b1.Bytes(), b.Bytes())
+		}
 	}
 }
 
-// See https://golang.org/issue/2508
+// See http://code.google.com/p/go/issues/detail?id=2508
 func TestRegression2508(t *testing.T) {
 	if testing.Short() {
 		t.Logf("test disabled with -short")
@@ -495,7 +468,7 @@
 		t.Fatalf("NewWriter: %v", err)
 	}
 	buf := make([]byte, 1024)
-	for i := 0; i < 131072; i++ {
+	for range 131072 {
 		if _, err := w.Write(buf); err != nil {
 			t.Fatalf("writer failed: %v", err)
 		}
@@ -504,8 +477,10 @@
 }
 
 func TestWriterReset(t *testing.T) {
-	t.Parallel()
-	for level := 0; level <= 9; level++ {
+	for level := -2; level <= 9; level++ {
+		if level == -1 {
+			level++
+		}
 		if testing.Short() && level > 1 {
 			break
 		}
@@ -514,11 +489,7 @@
 			t.Fatalf("NewWriter: %v", err)
 		}
 		buf := []byte("hello world")
-		n := 1024
-		if testing.Short() {
-			n = 10
-		}
-		for i := 0; i < n; i++ {
+		for range 1024 {
 			w.Write(buf)
 		}
 		w.Reset(io.Discard)
@@ -531,12 +502,12 @@
 		// DeepEqual doesn't compare functions.
 		w.d.fill, wref.d.fill = nil, nil
 		w.d.step, wref.d.step = nil, nil
-		w.d.bulkHasher, wref.d.bulkHasher = nil, nil
-		w.d.bestSpeed, wref.d.bestSpeed = nil, nil
+		w.d.state, wref.d.state = nil, nil
+		w.d.fast, wref.d.fast = nil, nil
+
 		// hashMatch is always overwritten when used.
-		copy(w.d.hashMatch[:], wref.d.hashMatch[:])
-		if len(w.d.tokens) != 0 {
-			t.Errorf("level %d Writer not reset after Reset. %d tokens were present", level, len(w.d.tokens))
+		if w.d.tokens.n != 0 {
+			t.Errorf("level %d Writer not reset after Reset. %d tokens were present", level, w.d.tokens.n)
 		}
 		// As long as the length is 0, we don't care about the content.
 		w.d.tokens = wref.d.tokens
@@ -548,76 +519,64 @@
 		}
 	}
 
-	levels := []int{0, 1, 2, 5, 9}
-	for _, level := range levels {
-		t.Run(fmt.Sprint(level), func(t *testing.T) {
-			testResetOutput(t, level, nil)
+	for i := HuffmanOnly; i <= BestCompression; i++ {
+		testResetOutput(t, fmt.Sprint("level-", i), func(w io.Writer) (*Writer, error) { return NewWriter(w, i) })
+	}
+	dict := []byte(strings.Repeat("we are the world - how are you?", 3))
+	for i := HuffmanOnly; i <= BestCompression; i++ {
+		testResetOutput(t, fmt.Sprint("dict-level-", i), func(w io.Writer) (*Writer, error) { return NewWriterDict(w, i, dict) })
+	}
+	for i := HuffmanOnly; i <= BestCompression; i++ {
+		testResetOutput(t, fmt.Sprint("dict-reset-level-", i), func(w io.Writer) (*Writer, error) {
+			w2, err := NewWriter(nil, i)
+			if err != nil {
+				return w2, err
+			}
+			w2.ResetDict(w, dict)
+			return w2, nil
 		})
 	}
-
-	t.Run("dict", func(t *testing.T) {
-		for _, level := range levels {
-			t.Run(fmt.Sprint(level), func(t *testing.T) {
-				testResetOutput(t, level, nil)
-			})
-		}
-	})
 }
 
-func testResetOutput(t *testing.T, level int, dict []byte) {
-	writeData := func(w *Writer) {
-		msg := []byte("now is the time for all good gophers")
-		w.Write(msg)
-		w.Flush()
-
-		hello := []byte("hello world")
-		for i := 0; i < 1024; i++ {
-			w.Write(hello)
+func testResetOutput(t *testing.T, name string, newWriter func(w io.Writer) (*Writer, error)) {
+	t.Run(name, func(t *testing.T) {
+		buf := new(bytes.Buffer)
+		w, err := newWriter(buf)
+		if err != nil {
+			t.Fatalf("NewWriter: %v", err)
 		}
+		b := []byte("hello world - how are you doing?")
+		for range 1024 {
+			w.Write(b)
+		}
+		w.Close()
+		out1 := buf.Bytes()
 
-		fill := bytes.Repeat([]byte("x"), 65000)
-		w.Write(fill)
-	}
+		buf2 := new(bytes.Buffer)
+		w.Reset(buf2)
+		for range 1024 {
+			w.Write(b)
+		}
+		w.Close()
+		out2 := buf2.Bytes()
 
-	buf := new(bytes.Buffer)
-	var w *Writer
-	var err error
-	if dict == nil {
-		w, err = NewWriter(buf, level)
-	} else {
-		w, err = NewWriterDict(buf, level, dict)
-	}
-	if err != nil {
-		t.Fatalf("NewWriter: %v", err)
-	}
-
-	writeData(w)
-	w.Close()
-	out1 := buf.Bytes()
-
-	buf2 := new(bytes.Buffer)
-	w.Reset(buf2)
-	writeData(w)
-	w.Close()
-	out2 := buf2.Bytes()
-
-	if len(out1) != len(out2) {
-		t.Errorf("got %d, expected %d bytes", len(out2), len(out1))
-		return
-	}
-	if !bytes.Equal(out1, out2) {
-		mm := 0
-		for i, b := range out1[:len(out2)] {
-			if b != out2[i] {
-				t.Errorf("mismatch index %d: %#02x, expected %#02x", i, out2[i], b)
-			}
-			mm++
-			if mm == 10 {
-				t.Fatal("Stopping")
+		if len(out1) != len(out2) {
+			t.Errorf("got %d, expected %d bytes", len(out2), len(out1))
+		}
+		if !bytes.Equal(out1, out2) {
+			mm := 0
+			for i, b := range out1[:len(out2)] {
+				if b != out2[i] {
+					t.Errorf("mismatch index %d: %02x, expected %02x", i, out2[i], b)
+				}
+				mm++
+				if mm == 10 {
+					t.Fatal("Stopping")
+				}
 			}
 		}
-	}
-	t.Logf("got %d bytes", len(out1))
+		t.Logf("got %d bytes", len(out1))
+	})
 }
 
 // TestBestSpeed tests that round-tripping through deflate and then inflate
@@ -625,7 +584,6 @@
 // compressor.encSpeed method (0, 16, 128), as well as near maxStoreBlockSize
 // (65535).
 func TestBestSpeed(t *testing.T) {
-	t.Parallel()
 	abc := make([]byte, 128)
 	for i := range abc {
 		abc[i] = byte(i)
@@ -653,8 +611,8 @@
 	}
 
 	for i, tc := range testCases {
-		if i >= 3 && testing.Short() {
-			break
+		if testing.Short() && i > 5 {
+			t.Skip()
 		}
 		for _, firstN := range []int{1, 65534, 65535, 65536, 65537, 131072} {
 			tc[0] = firstN
@@ -703,368 +661,3 @@
 		}
 	}
 }
-
-var errIO = errors.New("IO error")
-
-// failWriter fails with errIO exactly at the nth call to Write.
-type failWriter struct{ n int }
-
-func (w *failWriter) Write(b []byte) (int, error) {
-	w.n--
-	if w.n == -1 {
-		return 0, errIO
-	}
-	return len(b), nil
-}
-
-func TestWriterPersistentWriteError(t *testing.T) {
-	t.Parallel()
-	d, err := os.ReadFile("../../testdata/Isaac.Newton-Opticks.txt")
-	if err != nil {
-		t.Fatalf("ReadFile: %v", err)
-	}
-	d = d[:10000] // Keep this test short
-
-	zw, err := NewWriter(nil, DefaultCompression)
-	if err != nil {
-		t.Fatalf("NewWriter: %v", err)
-	}
-
-	// Sweep over the threshold at which an error is returned.
-	// The variable i makes it such that the ith call to failWriter.Write will
-	// return errIO. Since failWriter errors are not persistent, we must ensure
-	// that flate.Writer errors are persistent.
-	for i := 0; i < 1000; i++ {
-		fw := &failWriter{i}
-		zw.Reset(fw)
-
-		_, werr := zw.Write(d)
-		cerr := zw.Close()
-		ferr := zw.Flush()
-		if werr != errIO && werr != nil {
-			t.Errorf("test %d, mismatching Write error: got %v, want %v", i, werr, errIO)
-		}
-		if cerr != errIO && fw.n < 0 {
-			t.Errorf("test %d, mismatching Close error: got %v, want %v", i, cerr, errIO)
-		}
-		if ferr != errIO && fw.n < 0 {
-			t.Errorf("test %d, mismatching Flush error: got %v, want %v", i, ferr, errIO)
-		}
-		if fw.n >= 0 {
-			// At this point, the failure threshold was sufficiently high enough
-			// that we wrote the whole stream without any errors.
-			return
-		}
-	}
-}
-func TestWriterPersistentFlushError(t *testing.T) {
-	zw, err := NewWriter(&failWriter{0}, DefaultCompression)
-	if err != nil {
-		t.Fatalf("NewWriter: %v", err)
-	}
-	flushErr := zw.Flush()
-	closeErr := zw.Close()
-	_, writeErr := zw.Write([]byte("Test"))
-	checkErrors([]error{closeErr, flushErr, writeErr}, errIO, t)
-}
-
-func TestWriterPersistentCloseError(t *testing.T) {
-	// If underlying writer return error on closing stream we should persistent this error across all writer calls.
-	zw, err := NewWriter(&failWriter{0}, DefaultCompression)
-	if err != nil {
-		t.Fatalf("NewWriter: %v", err)
-	}
-	closeErr := zw.Close()
-	flushErr := zw.Flush()
-	_, writeErr := zw.Write([]byte("Test"))
-	checkErrors([]error{closeErr, flushErr, writeErr}, errIO, t)
-
-	// After closing writer we should persistent "write after close" error across Flush and Write calls, but return nil
-	// on next Close calls.
-	var b bytes.Buffer
-	zw.Reset(&b)
-	err = zw.Close()
-	if err != nil {
-		t.Fatalf("First call to close returned error: %s", err)
-	}
-	err = zw.Close()
-	if err != nil {
-		t.Fatalf("Second call to close returned error: %s", err)
-	}
-
-	flushErr = zw.Flush()
-	_, writeErr = zw.Write([]byte("Test"))
-	checkErrors([]error{flushErr, writeErr}, errWriterClosed, t)
-}
-
-func checkErrors(got []error, want error, t *testing.T) {
-	t.Helper()
-	for _, err := range got {
-		if err != want {
-			t.Errorf("Error doesn't match\nWant: %s\nGot: %s", want, got)
-		}
-	}
-}
-
-func TestBestSpeedMatch(t *testing.T) {
-	t.Parallel()
-	cases := []struct {
-		previous, current []byte
-		t, s, want        int32
-	}{{
-		previous: []byte{0, 0, 0, 1, 2},
-		current:  []byte{3, 4, 5, 0, 1, 2, 3, 4, 5},
-		t:        -3,
-		s:        3,
-		want:     6,
-	}, {
-		previous: []byte{0, 0, 0, 1, 2},
-		current:  []byte{2, 4, 5, 0, 1, 2, 3, 4, 5},
-		t:        -3,
-		s:        3,
-		want:     3,
-	}, {
-		previous: []byte{0, 0, 0, 1, 1},
-		current:  []byte{3, 4, 5, 0, 1, 2, 3, 4, 5},
-		t:        -3,
-		s:        3,
-		want:     2,
-	}, {
-		previous: []byte{0, 0, 0, 1, 2},
-		current:  []byte{2, 2, 2, 2, 1, 2, 3, 4, 5},
-		t:        -1,
-		s:        0,
-		want:     4,
-	}, {
-		previous: []byte{0, 0, 0, 1, 2, 3, 4, 5, 2, 2},
-		current:  []byte{2, 2, 2, 2, 1, 2, 3, 4, 5},
-		t:        -7,
-		s:        4,
-		want:     5,
-	}, {
-		previous: []byte{9, 9, 9, 9, 9},
-		current:  []byte{2, 2, 2, 2, 1, 2, 3, 4, 5},
-		t:        -1,
-		s:        0,
-		want:     0,
-	}, {
-		previous: []byte{9, 9, 9, 9, 9},
-		current:  []byte{9, 2, 2, 2, 1, 2, 3, 4, 5},
-		t:        0,
-		s:        1,
-		want:     0,
-	}, {
-		previous: []byte{},
-		current:  []byte{9, 2, 2, 2, 1, 2, 3, 4, 5},
-		t:        -5,
-		s:        1,
-		want:     0,
-	}, {
-		previous: []byte{},
-		current:  []byte{9, 2, 2, 2, 1, 2, 3, 4, 5},
-		t:        -1,
-		s:        1,
-		want:     0,
-	}, {
-		previous: []byte{},
-		current:  []byte{2, 2, 2, 2, 1, 2, 3, 4, 5},
-		t:        0,
-		s:        1,
-		want:     3,
-	}, {
-		previous: []byte{3, 4, 5},
-		current:  []byte{3, 4, 5},
-		t:        -3,
-		s:        0,
-		want:     3,
-	}, {
-		previous: make([]byte, 1000),
-		current:  make([]byte, 1000),
-		t:        -1000,
-		s:        0,
-		want:     maxMatchLength - 4,
-	}, {
-		previous: make([]byte, 200),
-		current:  make([]byte, 500),
-		t:        -200,
-		s:        0,
-		want:     maxMatchLength - 4,
-	}, {
-		previous: make([]byte, 200),
-		current:  make([]byte, 500),
-		t:        0,
-		s:        1,
-		want:     maxMatchLength - 4,
-	}, {
-		previous: make([]byte, maxMatchLength-4),
-		current:  make([]byte, 500),
-		t:        -(maxMatchLength - 4),
-		s:        0,
-		want:     maxMatchLength - 4,
-	}, {
-		previous: make([]byte, 200),
-		current:  make([]byte, 500),
-		t:        -200,
-		s:        400,
-		want:     100,
-	}, {
-		previous: make([]byte, 10),
-		current:  make([]byte, 500),
-		t:        200,
-		s:        400,
-		want:     100,
-	}}
-	for i, c := range cases {
-		e := deflateFast{prev: c.previous}
-		got := e.matchLen(c.s, c.t, c.current)
-		if got != c.want {
-			t.Errorf("Test %d: match length, want %d, got %d", i, c.want, got)
-		}
-	}
-}
-
-func TestBestSpeedMaxMatchOffset(t *testing.T) {
-	t.Parallel()
-	const abc, xyz = "abcdefgh", "stuvwxyz"
-	for _, matchBefore := range []bool{false, true} {
-		for _, extra := range []int{0, inputMargin - 1, inputMargin, inputMargin + 1, 2 * inputMargin} {
-			for offsetAdj := -5; offsetAdj <= +5; offsetAdj++ {
-				report := func(desc string, err error) {
-					t.Errorf("matchBefore=%t, extra=%d, offsetAdj=%d: %s%v",
-						matchBefore, extra, offsetAdj, desc, err)
-				}
-
-				offset := maxMatchOffset + offsetAdj
-
-				// Make src to be a []byte of the form
-				//	"%s%s%s%s%s" % (abc, zeros0, xyzMaybe, abc, zeros1)
-				// where:
-				//	zeros0 is approximately maxMatchOffset zeros.
-				//	xyzMaybe is either xyz or the empty string.
-				//	zeros1 is between 0 and 30 zeros.
-				// The difference between the two abc's will be offset, which
-				// is maxMatchOffset plus or minus a small adjustment.
-				src := make([]byte, offset+len(abc)+extra)
-				copy(src, abc)
-				if !matchBefore {
-					copy(src[offset-len(xyz):], xyz)
-				}
-				copy(src[offset:], abc)
-
-				buf := new(bytes.Buffer)
-				w, err := NewWriter(buf, BestSpeed)
-				if err != nil {
-					report("NewWriter: ", err)
-					continue
-				}
-				if _, err := w.Write(src); err != nil {
-					report("Write: ", err)
-					continue
-				}
-				if err := w.Close(); err != nil {
-					report("Writer.Close: ", err)
-					continue
-				}
-
-				r := NewReader(buf)
-				dst, err := io.ReadAll(r)
-				r.Close()
-				if err != nil {
-					report("ReadAll: ", err)
-					continue
-				}
-
-				if !bytes.Equal(dst, src) {
-					report("", fmt.Errorf("bytes differ after round-tripping"))
-					continue
-				}
-			}
-		}
-	}
-}
-
-func TestBestSpeedShiftOffsets(t *testing.T) {
-	// Test if shiftoffsets properly preserves matches and resets out-of-range matches
-	// seen in https://github.com/golang/go/issues/4142
-	enc := newDeflateFast()
-
-	// testData may not generate internal matches.
-	testData := make([]byte, 32)
-	rng := rand.New(rand.NewSource(0))
-	for i := range testData {
-		testData[i] = byte(rng.Uint32())
-	}
-
-	// Encode the testdata with clean state.
-	// Second part should pick up matches from the first block.
-	wantFirstTokens := len(enc.encode(nil, testData))
-	wantSecondTokens := len(enc.encode(nil, testData))
-
-	if wantFirstTokens <= wantSecondTokens {
-		t.Fatalf("test needs matches between inputs to be generated")
-	}
-	// Forward the current indicator to before wraparound.
-	enc.cur = bufferReset - int32(len(testData))
-
-	// Part 1 before wrap, should match clean state.
-	got := len(enc.encode(nil, testData))
-	if wantFirstTokens != got {
-		t.Errorf("got %d, want %d tokens", got, wantFirstTokens)
-	}
-
-	// Verify we are about to wrap.
-	if enc.cur != bufferReset {
-		t.Errorf("got %d, want e.cur to be at bufferReset (%d)", enc.cur, bufferReset)
-	}
-
-	// Part 2 should match clean state as well even if wrapped.
-	got = len(enc.encode(nil, testData))
-	if wantSecondTokens != got {
-		t.Errorf("got %d, want %d token", got, wantSecondTokens)
-	}
-
-	// Verify that we wrapped.
-	if enc.cur >= bufferReset {
-		t.Errorf("want e.cur to be < bufferReset (%d), got %d", bufferReset, enc.cur)
-	}
-
-	// Forward the current buffer, leaving the matches at the bottom.
-	enc.cur = bufferReset
-	enc.shiftOffsets()
-
-	// Ensure that no matches were picked up.
-	got = len(enc.encode(nil, testData))
-	if wantFirstTokens != got {
-		t.Errorf("got %d, want %d tokens", got, wantFirstTokens)
-	}
-}
-
-func TestMaxStackSize(t *testing.T) {
-	// This test must not run in parallel with other tests as debug.SetMaxStack
-	// affects all goroutines.
-	n := debug.SetMaxStack(1 << 16)
-	defer debug.SetMaxStack(n)
-
-	var wg sync.WaitGroup
-	defer wg.Wait()
-
-	b := make([]byte, 1<<20)
-	for level := HuffmanOnly; level <= BestCompression; level++ {
-		// Run in separate goroutine to increase probability of stack regrowth.
-		wg.Add(1)
-		go func(level int) {
-			defer wg.Done()
-			zw, err := NewWriter(io.Discard, level)
-			if err != nil {
-				t.Errorf("level %d, NewWriter() = %v, want nil", level, err)
-			}
-			if n, err := zw.Write(b); n != len(b) || err != nil {
-				t.Errorf("level %d, Write() = (%d, %v), want (%d, nil)", level, n, err, len(b))
-			}
-			if err := zw.Close(); err != nil {
-				t.Errorf("level %d, Close() = %v, want nil", level, err)
-			}
-			zw.Reset(io.Discard)
-		}(level)
-	}
-}
diff --git a/src/compress/flate/deflatefast.go b/src/compress/flate/deflatefast.go
index e5554d6..e132c55 100644
--- a/src/compress/flate/deflatefast.go
+++ b/src/compress/flate/deflatefast.go
@@ -4,304 +4,170 @@
 
 package flate
 
-import "math"
-
-// This encoding algorithm, which prioritizes speed over output size, is
-// based on Snappy's LZ77-style encoder: github.com/golang/snappy
-
-const (
-	tableBits  = 14             // Bits used in the table.
-	tableSize  = 1 << tableBits // Size of the table.
-	tableMask  = tableSize - 1  // Mask for table indices. Redundant, but can eliminate bounds checks.
-	tableShift = 32 - tableBits // Right-shift to get the tableBits most significant bits of a uint32.
-
-	// Reset the buffer offset when reaching this.
-	// Offsets are stored between blocks as int32 values.
-	// Since the offset we are checking against is at the beginning
-	// of the buffer, we need to subtract the current and input
-	// buffer to not risk overflowing the int32.
-	bufferReset = math.MaxInt32 - maxStoreBlockSize*2
+import (
+	"math/bits"
 )
 
-func load32(b []byte, i int32) uint32 {
-	b = b[i : i+4 : len(b)] // Help the compiler eliminate bounds checks on the next line.
-	return uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24
+type fastEnc interface {
+	Encode(dst *tokens, src []byte)
+	Reset()
 }
 
-func load64(b []byte, i int32) uint64 {
-	b = b[i : i+8 : len(b)] // Help the compiler eliminate bounds checks on the next line.
-	return uint64(b[0]) | uint64(b[1])<<8 | uint64(b[2])<<16 | uint64(b[3])<<24 |
-		uint64(b[4])<<32 | uint64(b[5])<<40 | uint64(b[6])<<48 | uint64(b[7])<<56
+func newFastEnc(level int) fastEnc {
+	switch level {
+	case 1:
+		return &fastEncL1{fastGen: fastGen{cur: maxStoreBlockSize}}
+	case 2:
+		return &fastEncL2{fastGen: fastGen{cur: maxStoreBlockSize}}
+	case 3:
+		return &fastEncL3{fastGen: fastGen{cur: maxStoreBlockSize}}
+	case 4:
+		return &fastEncL4{fastGen: fastGen{cur: maxStoreBlockSize}}
+	case 5:
+		return &fastEncL5{fastGen: fastGen{cur: maxStoreBlockSize}}
+	case 6:
+		return &fastEncL6{fastGen: fastGen{cur: maxStoreBlockSize}}
+	default:
+		panic("invalid level specified")
+	}
 }
 
-func hash(u uint32) uint32 {
-	return (u * 0x1e35a7bd) >> tableShift
-}
-
-// These constants are defined by the Snappy implementation so that its
-// assembly implementation can fast-path some 16-bytes-at-a-time copies. They
-// aren't necessary in the pure Go implementation, as we don't use those same
-// optimizations, but using the same thresholds doesn't really hurt.
 const (
-	inputMargin            = 16 - 1
-	minNonLiteralBlockSize = 1 + 1 + inputMargin
+	tableBits       = 15             // Bits used in the table
+	tableSize       = 1 << tableBits // Size of the table
+	baseMatchOffset = 1              // The smallest match offset
+	baseMatchLength = 3              // The smallest match length per the RFC section 3.2.5
+	maxMatchOffset  = 1 << 15        // The largest match offset
+
+	bTableBits   = 17                                               // Bits used in the big tables
+	bTableSize   = 1 << bTableBits                                  // Size of the table
+	allocHistory = maxStoreBlockSize * 5                            // Size to preallocate for history.
+	bufferReset  = (1 << 31) - allocHistory - maxStoreBlockSize - 1 // Reset the buffer offset when reaching this.
+)
+
+const (
+	prime3bytes = 506832829
+	prime4bytes = 2654435761
+	prime5bytes = 889523592379
+	prime6bytes = 227718039650203
+	prime7bytes = 58295818150454627
+	prime8bytes = 0xcf1bbcdcb7a56463
 )
 
 type tableEntry struct {
-	val    uint32 // Value at destination
 	offset int32
 }
 
-// deflateFast maintains the table for matches,
-// and the previous byte block for cross block matching.
-type deflateFast struct {
-	table [tableSize]tableEntry
-	prev  []byte // Previous block, zero length if unknown.
-	cur   int32  // Current match offset.
+// fastGen maintains the table for matches,
+// and the previous byte block for level 2.
+// This is the generic implementation.
+type fastGen struct {
+	hist []byte
+	cur  int32
 }
 
-func newDeflateFast() *deflateFast {
-	return &deflateFast{cur: maxStoreBlockSize, prev: make([]byte, 0, maxStoreBlockSize)}
+func (e *fastGen) addBlock(src []byte) int32 {
+	// check if we have space already
+	if len(e.hist)+len(src) > cap(e.hist) {
+		if cap(e.hist) == 0 {
+			e.hist = make([]byte, 0, allocHistory)
+		} else {
+			if cap(e.hist) < maxMatchOffset*2 {
+				panic("unexpected buffer size")
+			}
+			// Move down
+			offset := int32(len(e.hist)) - maxMatchOffset
+			// copy(e.hist[0:maxMatchOffset], e.hist[offset:])
+			*(*[maxMatchOffset]byte)(e.hist) = *(*[maxMatchOffset]byte)(e.hist[offset:])
+			e.cur += offset
+			e.hist = e.hist[:maxMatchOffset]
+		}
+	}
+	s := int32(len(e.hist))
+	e.hist = append(e.hist, src...)
+	return s
 }
 
-// encode encodes a block given in src and appends tokens
-// to dst and returns the result.
-func (e *deflateFast) encode(dst []token, src []byte) []token {
-	// Ensure that e.cur doesn't wrap.
-	if e.cur >= bufferReset {
-		e.shiftOffsets()
-	}
-
-	// This check isn't in the Snappy implementation, but there, the caller
-	// instead of the callee handles this case.
-	if len(src) < minNonLiteralBlockSize {
-		e.cur += maxStoreBlockSize
-		e.prev = e.prev[:0]
-		return emitLiteral(dst, src)
-	}
-
-	// sLimit is when to stop looking for offset/length copies. The inputMargin
-	// lets us use a fast path for emitLiteral in the main loop, while we are
-	// looking for copies.
-	sLimit := int32(len(src) - inputMargin)
-
-	// nextEmit is where in src the next emitLiteral should start from.
-	nextEmit := int32(0)
-	s := int32(0)
-	cv := load32(src, s)
-	nextHash := hash(cv)
-
-	for {
-		// Copied from the C++ snappy implementation:
-		//
-		// Heuristic match skipping: If 32 bytes are scanned with no matches
-		// found, start looking only at every other byte. If 32 more bytes are
-		// scanned (or skipped), look at every third byte, etc.. When a match
-		// is found, immediately go back to looking at every byte. This is a
-		// small loss (~5% performance, ~0.1% density) for compressible data
-		// due to more bookkeeping, but for non-compressible data (such as
-		// JPEG) it's a huge win since the compressor quickly "realizes" the
-		// data is incompressible and doesn't bother looking for matches
-		// everywhere.
-		//
-		// The "skip" variable keeps track of how many bytes there are since
-		// the last match; dividing it by 32 (ie. right-shifting by five) gives
-		// the number of bytes to move ahead for each iteration.
-		skip := int32(32)
-
-		nextS := s
-		var candidate tableEntry
-		for {
-			s = nextS
-			bytesBetweenHashLookups := skip >> 5
-			nextS = s + bytesBetweenHashLookups
-			skip += bytesBetweenHashLookups
-			if nextS > sLimit {
-				goto emitRemainder
-			}
-			candidate = e.table[nextHash&tableMask]
-			now := load32(src, nextS)
-			e.table[nextHash&tableMask] = tableEntry{offset: s + e.cur, val: cv}
-			nextHash = hash(now)
-
-			offset := s - (candidate.offset - e.cur)
-			if offset > maxMatchOffset || cv != candidate.val {
-				// Out of range or not matched.
-				cv = now
-				continue
-			}
-			break
-		}
-
-		// A 4-byte match has been found. We'll later see if more than 4 bytes
-		// match. But, prior to the match, src[nextEmit:s] are unmatched. Emit
-		// them as literal bytes.
-		dst = emitLiteral(dst, src[nextEmit:s])
-
-		// Call emitCopy, and then see if another emitCopy could be our next
-		// move. Repeat until we find no match for the input immediately after
-		// what was consumed by the last emitCopy call.
-		//
-		// If we exit this loop normally then we need to call emitLiteral next,
-		// though we don't yet know how big the literal will be. We handle that
-		// by proceeding to the next iteration of the main loop. We also can
-		// exit this loop via goto if we get close to exhausting the input.
-		for {
-			// Invariant: we have a 4-byte match at s, and no need to emit any
-			// literal bytes prior to s.
-
-			// Extend the 4-byte match as long as possible.
-			//
-			s += 4
-			t := candidate.offset - e.cur + 4
-			l := e.matchLen(s, t, src)
-
-			// matchToken is flate's equivalent of Snappy's emitCopy. (length,offset)
-			dst = append(dst, matchToken(uint32(l+4-baseMatchLength), uint32(s-t-baseMatchOffset)))
-			s += l
-			nextEmit = s
-			if s >= sLimit {
-				goto emitRemainder
-			}
-
-			// We could immediately start working at s now, but to improve
-			// compression we first update the hash table at s-1 and at s. If
-			// another emitCopy is not our next move, also calculate nextHash
-			// at s+1. At least on GOARCH=amd64, these three hash calculations
-			// are faster as one load64 call (with some shifts) instead of
-			// three load32 calls.
-			x := load64(src, s-1)
-			prevHash := hash(uint32(x))
-			e.table[prevHash&tableMask] = tableEntry{offset: e.cur + s - 1, val: uint32(x)}
-			x >>= 8
-			currHash := hash(uint32(x))
-			candidate = e.table[currHash&tableMask]
-			e.table[currHash&tableMask] = tableEntry{offset: e.cur + s, val: uint32(x)}
-
-			offset := s - (candidate.offset - e.cur)
-			if offset > maxMatchOffset || uint32(x) != candidate.val {
-				cv = uint32(x >> 8)
-				nextHash = hash(cv)
-				s++
-				break
-			}
-		}
-	}
-
-emitRemainder:
-	if int(nextEmit) < len(src) {
-		dst = emitLiteral(dst, src[nextEmit:])
-	}
-	e.cur += int32(len(src))
-	e.prev = e.prev[:len(src)]
-	copy(e.prev, src)
-	return dst
+type tableEntryPrev struct {
+	Cur  tableEntry
+	Prev tableEntry
 }
 
-func emitLiteral(dst []token, lit []byte) []token {
-	for _, v := range lit {
-		dst = append(dst, literalToken(uint32(v)))
-	}
-	return dst
+// hash7 returns the hash of the lowest 7 bytes of u to fit in a hash table with h bits.
+// Preferably h should be a constant and should always be <64.
+func hash7(u uint64, h uint8) uint32 {
+	return uint32(((u << (64 - 56)) * prime7bytes) >> ((64 - h) & reg8SizeMask64))
 }
 
-// matchLen returns the match length between src[s:] and src[t:].
-// t can be negative to indicate the match is starting in e.prev.
-// We assume that src[s-4:s] and src[t-4:t] already match.
-func (e *deflateFast) matchLen(s, t int32, src []byte) int32 {
-	s1 := int(s) + maxMatchLength - 4
-	if s1 > len(src) {
-		s1 = len(src)
+// hashLen returns a hash of the lowest mls bytes of with length output bits.
+// mls must be >=3 and <=8. Any other value will return hash for 4 bytes.
+// length should always be < 32.
+// Preferably, length and mls should be a constant for inlining.
+func hashLen(u uint64, length, mls uint8) uint32 {
+	switch mls {
+	case 3:
+		return (uint32(u<<8) * prime3bytes) >> (32 - length)
+	case 5:
+		return uint32(((u << (64 - 40)) * prime5bytes) >> (64 - length))
+	case 6:
+		return uint32(((u << (64 - 48)) * prime6bytes) >> (64 - length))
+	case 7:
+		return uint32(((u << (64 - 56)) * prime7bytes) >> (64 - length))
+	case 8:
+		return uint32((u * prime8bytes) >> (64 - length))
+	default:
+		return (uint32(u) * prime4bytes) >> (32 - length)
 	}
+}
 
-	// If we are inside the current block
-	if t >= 0 {
-		b := src[t:]
-		a := src[s:s1]
-		b = b[:len(a)]
-		// Extend the match to be as long as possible.
-		for i := range a {
-			if a[i] != b[i] {
-				return int32(i)
-			}
+// matchLenLimited will return the match length between offsets and t in src.
+// The maximum length returned is maxMatchLength - 4.
+// It is assumed that s > t, that t >=0 and s < len(src).
+func (e *fastGen) matchLenLimited(s, t int, src []byte) int32 {
+	a := src[s:min(s+maxMatchLength-4, len(src))]
+	b := src[t:]
+	return int32(matchLen(a, b))
+}
+
+// matchlenLong will return the match length between offsets and t in src.
+// It is assumed that s > t, that t >=0 and s < len(src).
+func (e *fastGen) matchlenLong(s, t int, src []byte) int32 {
+	return int32(matchLen(src[s:], src[t:]))
+}
+
+// Reset the encoding table.
+func (e *fastGen) Reset() {
+	if cap(e.hist) < allocHistory {
+		e.hist = make([]byte, 0, allocHistory)
+	}
+	// We offset current position so everything will be out of reach.
+	// If we are above the buffer reset it will be cleared anyway since len(hist) == 0.
+	if e.cur <= bufferReset {
+		e.cur += maxMatchOffset + int32(len(e.hist))
+	}
+	e.hist = e.hist[:0]
+}
+
+// matchLen returns the maximum common prefix length of a and b.
+// a must be the shortest of the two.
+func matchLen(a, b []byte) (n int) {
+	left := len(a)
+	for left >= 8 {
+		diff := loadLE64(a, n) ^ loadLE64(b, n)
+		if diff != 0 {
+			return n + bits.TrailingZeros64(diff)>>3
 		}
-		return int32(len(a))
+		n += 8
+		left -= 8
 	}
 
-	// We found a match in the previous block.
-	tp := int32(len(e.prev)) + t
-	if tp < 0 {
-		return 0
-	}
-
-	// Extend the match to be as long as possible.
-	a := src[s:s1]
-	b := e.prev[tp:]
-	if len(b) > len(a) {
-		b = b[:len(a)]
-	}
-	a = a[:len(b)]
-	for i := range b {
-		if a[i] != b[i] {
-			return int32(i)
-		}
-	}
-
-	// If we reached our limit, we matched everything we are
-	// allowed to in the previous block and we return.
-	n := int32(len(b))
-	if int(s+n) == s1 {
-		return n
-	}
-
-	// Continue looking for more matches in the current block.
-	a = src[s+n : s1]
-	b = src[:len(a)]
+	a = a[n:]
+	b = b[n:]
 	for i := range a {
 		if a[i] != b[i] {
-			return int32(i) + n
+			break
 		}
+		n++
 	}
-	return int32(len(a)) + n
-}
-
-// Reset resets the encoding history.
-// This ensures that no matches are made to the previous block.
-func (e *deflateFast) reset() {
-	e.prev = e.prev[:0]
-	// Bump the offset, so all matches will fail distance check.
-	// Nothing should be >= e.cur in the table.
-	e.cur += maxMatchOffset
-
-	// Protect against e.cur wraparound.
-	if e.cur >= bufferReset {
-		e.shiftOffsets()
-	}
-}
-
-// shiftOffsets will shift down all match offset.
-// This is only called in rare situations to prevent integer overflow.
-//
-// See https://golang.org/issue/18636 and https://github.com/golang/go/issues/34121.
-func (e *deflateFast) shiftOffsets() {
-	if len(e.prev) == 0 {
-		// We have no history; just clear the table.
-		clear(e.table[:])
-		e.cur = maxMatchOffset + 1
-		return
-	}
-
-	// Shift down everything in the table that isn't already too far away.
-	for i := range e.table[:] {
-		v := e.table[i].offset - e.cur + maxMatchOffset + 1
-		if v < 0 {
-			// We want to reset e.cur to maxMatchOffset + 1, so we need to shift
-			// all table entries down by (e.cur - (maxMatchOffset + 1)).
-			// Because we ignore matches > maxMatchOffset, we can cap
-			// any negative offsets at 0.
-			v = 0
-		}
-		e.table[i].offset = v
-	}
-	e.cur = maxMatchOffset + 1
+	return n
 }
diff --git a/src/compress/flate/dict_decoder.go b/src/compress/flate/dict_decoder.go
index d2c1904..cb855ab 100644
--- a/src/compress/flate/dict_decoder.go
+++ b/src/compress/flate/dict_decoder.go
@@ -104,10 +104,7 @@
 	dstBase := dd.wrPos
 	dstPos := dstBase
 	srcPos := dstPos - dist
-	endPos := dstPos + length
-	if endPos > len(dd.hist) {
-		endPos = len(dd.hist)
-	}
+	endPos := min(dstPos+length, len(dd.hist))
 
 	// Copy non-overlapping section after destination position.
 	//
@@ -160,8 +157,10 @@
 	srcPos := dstPos - dist
 
 	// Copy possibly overlapping section before destination position.
-	for dstPos < endPos {
-		dstPos += copy(dd.hist[dstPos:endPos], dd.hist[srcPos:dstPos])
+loop:
+	dstPos += copy(dd.hist[dstPos:endPos], dd.hist[srcPos:dstPos])
+	if dstPos < endPos {
+		goto loop // Avoid for-loop so that this function can be inlined
 	}
 
 	dd.wrPos = dstPos
diff --git a/src/compress/flate/example_test.go b/src/compress/flate/example_test.go
index 5780092..3af5c1d 100644
--- a/src/compress/flate/example_test.go
+++ b/src/compress/flate/example_test.go
@@ -93,7 +93,7 @@
 	var b bytes.Buffer
 
 	// Compress the data using the specially crafted dictionary.
-	zw, err := flate.NewWriterDict(&b, flate.DefaultCompression, []byte(dict))
+	zw, err := flate.NewWriterDict(&b, flate.BestCompression, []byte(dict))
 	if err != nil {
 		log.Fatal(err)
 	}
@@ -168,6 +168,7 @@
 	wg.Add(1)
 	go func() {
 		defer wg.Done()
+		defer wp.Close()
 
 		zw, err := flate.NewWriter(wp, flate.BestSpeed)
 		if err != nil {
diff --git a/src/compress/flate/fuzz_test.go b/src/compress/flate/fuzz_test.go
new file mode 100644
index 0000000..1ea8cc4
--- /dev/null
+++ b/src/compress/flate/fuzz_test.go
@@ -0,0 +1,111 @@
+package flate
+
+import (
+	"bytes"
+	"flag"
+	"io"
+	"os"
+	"strconv"
+	"testing"
+)
+
+// Fuzzing tweaks:
+var fuzzStartF = flag.Int("start", HuffmanOnly, "Start fuzzing at this level")
+var fuzzEndF = flag.Int("end", BestCompression, "End fuzzing at this level (inclusive)")
+var fuzzMaxF = flag.Int("max", 1<<20, "Maximum input size")
+
+func TestMain(m *testing.M) {
+	flag.Parse()
+	os.Exit(m.Run())
+}
+
+// FuzzEncoding tests the fuzzer by doing roundtrips.
+// Every input is run through the fuzzer at every level.
+// Note: When running the fuzzer, it may hit the 10-second timeout on slower CPUs.
+func FuzzEncoding(f *testing.F) {
+	startFuzz := *fuzzStartF
+	endFuzz := *fuzzEndF
+	maxSize := *fuzzMaxF
+
+	decoder := NewReader(nil)
+	buf, buf2 := new(bytes.Buffer), new(bytes.Buffer)
+	encs := make([]*Writer, endFuzz-startFuzz+1)
+	for i := range encs {
+		var err error
+		encs[i], err = NewWriter(nil, i+startFuzz)
+		if err != nil {
+			f.Fatal(err.Error())
+		}
+	}
+
+	f.Fuzz(func(t *testing.T, data []byte) {
+		if len(data) > maxSize {
+			return
+		}
+		for level := startFuzz; level <= endFuzz; level++ {
+			if level == DefaultCompression {
+				continue // Already covered.
+			}
+			msg := "level " + strconv.Itoa(level) + ":"
+			buf.Reset()
+			fw := encs[level-startFuzz]
+			fw.Reset(buf)
+			n, err := fw.Write(data)
+			if n != len(data) {
+				t.Fatal(msg + "short write")
+			}
+			if err != nil {
+				t.Fatal(msg + err.Error())
+			}
+			err = fw.Close()
+			if err != nil {
+				t.Fatal(msg + err.Error())
+			}
+			compressed := buf.Bytes()
+			err = decoder.(Resetter).Reset(buf, nil)
+			if err != nil {
+				t.Fatal(msg + err.Error())
+			}
+			data2, err := io.ReadAll(decoder)
+			if err != nil {
+				t.Fatal(msg + err.Error())
+			}
+			if !bytes.Equal(data, data2) {
+				t.Fatal(msg + "decompressed not equal")
+			}
+
+			// Do it again...
+			msg = "level " + strconv.Itoa(level) + " (reset):"
+			buf2.Reset()
+			fw.Reset(buf2)
+			n, err = fw.Write(data)
+			if n != len(data) {
+				t.Fatal(msg + "short write")
+			}
+			if err != nil {
+				t.Fatal(msg + err.Error())
+			}
+			err = fw.Close()
+			if err != nil {
+				t.Fatal(msg + err.Error())
+			}
+			compressed2 := buf2.Bytes()
+			err = decoder.(Resetter).Reset(buf2, nil)
+			if err != nil {
+				t.Fatal(msg + err.Error())
+			}
+			data2, err = io.ReadAll(decoder)
+			if err != nil {
+				t.Fatal(msg + err.Error())
+			}
+			if !bytes.Equal(data, data2) {
+				t.Fatal(msg + "decompressed not equal")
+			}
+			// Determinism checks will usually not be reproducible,
+			// since it often relies on the internal state of the compressor.
+			if !bytes.Equal(compressed, compressed2) {
+				t.Fatal(msg + "non-deterministic output")
+			}
+		}
+	})
+}
diff --git a/src/compress/flate/huffman_bit_writer.go b/src/compress/flate/huffman_bit_writer.go
index d68c77f..f5e5092 100644
--- a/src/compress/flate/huffman_bit_writer.go
+++ b/src/compress/flate/huffman_bit_writer.go
@@ -6,6 +6,7 @@
 
 import (
 	"io"
+	"math"
 )
 
 const (
@@ -22,20 +23,22 @@
 	codegenCodeCount = 19
 	badCode          = 255
 
+	// maxPredefinedTokens is the maximum number of tokens
+	// where we check if fixed size is smaller.
+	maxPredefinedTokens = 250
+
 	// bufferFlushSize indicates the buffer size
 	// after which bytes are flushed to the writer.
 	// Should preferably be a multiple of 6, since
 	// we accumulate 6 bytes between writes to the buffer.
-	bufferFlushSize = 240
-
-	// bufferSize is the actual output byte buffer size.
-	// It must have additional headroom for a flush
-	// which can contain up to 8 bytes.
-	bufferSize = bufferFlushSize + 8
+	bufferFlushSize = 246
 )
 
+// Minimum length code that emits bits.
+const lengthExtraBitsMinCode = 8
+
 // The number of extra bits needed by length code X - LENGTH_CODES_START.
-var lengthExtraBits = []int8{
+var lengthExtraBits = [32]uint8{
 	/* 257 */ 0, 0, 0,
 	/* 260 */ 0, 0, 0, 0, 0, 1, 1, 1, 1, 2,
 	/* 270 */ 2, 2, 2, 3, 3, 3, 3, 4, 4, 4,
@@ -43,26 +46,47 @@
 }
 
 // The length indicated by length code X - LENGTH_CODES_START.
-var lengthBase = []uint32{
+var lengthBase = [32]uint8{
 	0, 1, 2, 3, 4, 5, 6, 7, 8, 10,
 	12, 14, 16, 20, 24, 28, 32, 40, 48, 56,
 	64, 80, 96, 112, 128, 160, 192, 224, 255,
 }
 
+// Minimum offset code that emits bits.
+const offsetExtraBitsMinCode = 4
+
 // offset code word extra bits.
-var offsetExtraBits = []int8{
+var offsetExtraBits = [32]int8{
 	0, 0, 0, 0, 1, 1, 2, 2, 3, 3,
 	4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
 	9, 9, 10, 10, 11, 11, 12, 12, 13, 13,
+	/* extended window */
+	14, 14,
 }
 
-var offsetBase = []uint32{
-	0x000000, 0x000001, 0x000002, 0x000003, 0x000004,
-	0x000006, 0x000008, 0x00000c, 0x000010, 0x000018,
-	0x000020, 0x000030, 0x000040, 0x000060, 0x000080,
-	0x0000c0, 0x000100, 0x000180, 0x000200, 0x000300,
-	0x000400, 0x000600, 0x000800, 0x000c00, 0x001000,
-	0x001800, 0x002000, 0x003000, 0x004000, 0x006000,
+var offsetCombined = [32]uint32{}
+
+func init() {
+	var offsetBase = [32]uint32{
+		/* normal deflate */
+		0x000000, 0x000001, 0x000002, 0x000003, 0x000004,
+		0x000006, 0x000008, 0x00000c, 0x000010, 0x000018,
+		0x000020, 0x000030, 0x000040, 0x000060, 0x000080,
+		0x0000c0, 0x000100, 0x000180, 0x000200, 0x000300,
+		0x000400, 0x000600, 0x000800, 0x000c00, 0x001000,
+		0x001800, 0x002000, 0x003000, 0x004000, 0x006000,
+
+		/* extended window */
+		0x008000, 0x00c000,
+	}
+
+	for i := range offsetCombined[:] {
+		// Don't use extended window values...
+		if offsetExtraBits[i] == 0 || offsetBase[i] > 0x006000 {
+			continue
+		}
+		offsetCombined[i] = uint32(offsetExtraBits[i]) | (offsetBase[i] << 8)
+	}
 }
 
 // The odd order in which the codegen code sizes are written.
@@ -75,29 +99,49 @@
 	writer io.Writer
 
 	// Data waiting to be written is bytes[0:nbytes]
-	// and then the low nbits of bits.  Data is always written
-	// sequentially into the bytes array.
-	bits            uint64
-	nbits           uint
-	bytes           [bufferSize]byte
-	codegenFreq     [codegenCodeCount]int32
-	nbytes          int
-	literalFreq     []int32
-	offsetFreq      []int32
-	codegen         []uint8
-	literalEncoding *huffmanEncoder
-	offsetEncoding  *huffmanEncoder
-	codegenEncoding *huffmanEncoder
-	err             error
+	// and then the low nbits of bits.
+	bits               uint64
+	nbits              uint8
+	nbytes             uint8
+	lastHuffMan        bool
+	literalEncoding    *huffmanEncoder
+	tmpLitEncoding     *huffmanEncoder
+	offsetEncoding     *huffmanEncoder
+	codegenEncoding    *huffmanEncoder
+	err                error
+	lastHeader         int
+	logNewTablePenalty uint // Bigger values will reduce the penalty of a new table.
+	bytes              [256 + 8]byte
+	literalFreq        [lengthCodesStart + 32]uint16
+	offsetFreq         [32]uint16
+	codegenFreq        [codegenCodeCount]uint16
+
+	// codegen must have an extra space for the final symbol.
+	codegen [literalCount + offsetCodeCount + 1]uint8
 }
 
+// The huffmanBitWriter supports reusing huffman tables and will combine
+// blocks, if compression is less than creating a new table.
+//
+// This is controlled by several variables:
+//
+// If 'lastHeader' is non-zero the Huffman table can be reused.
+// It also indicates that an EOB has not yet been emitted, so if a new table
+// is generated, an EOB with the previous table must be written.
+//
+// If 'lastHuffMan' is set, a table for outputting literals
+// has been generated and offsets are invalid.
+//
+// An incoming block estimates the output size of a new table using a
+// 'fresh' by calculating the optimal size and adding a penalty.
+// A Huffman table is not optimal, which is why we add a penalty,
+// and generating a new table is slower for both compression and decompression.
+
 func newHuffmanBitWriter(w io.Writer) *huffmanBitWriter {
 	return &huffmanBitWriter{
 		writer:          w,
-		literalFreq:     make([]int32, maxNumLit),
-		offsetFreq:      make([]int32, offsetCodeCount),
-		codegen:         make([]uint8, maxNumLit+offsetCodeCount+1),
-		literalEncoding: newHuffmanEncoder(maxNumLit),
+		literalEncoding: newHuffmanEncoder(literalCount),
+		tmpLitEncoding:  newHuffmanEncoder(literalCount),
 		codegenEncoding: newHuffmanEncoder(codegenCodeCount),
 		offsetEncoding:  newHuffmanEncoder(offsetCodeCount),
 	}
@@ -106,6 +150,37 @@
 func (w *huffmanBitWriter) reset(writer io.Writer) {
 	w.writer = writer
 	w.bits, w.nbits, w.nbytes, w.err = 0, 0, 0, nil
+	w.lastHeader = 0
+	w.lastHuffMan = false
+}
+
+func (w *huffmanBitWriter) canReuse(t *tokens) (ok bool) {
+	a := t.offHist[:offsetCodeCount]
+	b := w.offsetEncoding.codes
+	b = b[:len(a)]
+	for i, v := range a {
+		if v != 0 && b[i].zero() {
+			return false
+		}
+	}
+
+	a = t.extraHist[:literalCount-256]
+	b = w.literalEncoding.codes[256:literalCount]
+	b = b[:len(a)]
+	for i, v := range a {
+		if v != 0 && b[i].zero() {
+			return false
+		}
+	}
+
+	a = t.litHist[:256]
+	b = w.literalEncoding.codes[:len(a)]
+	for i, v := range a {
+		if v != 0 && b[i].zero() {
+			return false
+		}
+	}
+	return true
 }
 
 func (w *huffmanBitWriter) flush() {
@@ -113,6 +188,11 @@
 		w.nbits = 0
 		return
 	}
+	if w.lastHeader > 0 {
+		// We owe an EOB
+		w.writeCode(w.literalEncoding.codes[endBlockMarker])
+		w.lastHeader = 0
+	}
 	n := w.nbytes
 	for w.nbits != 0 {
 		w.bytes[n] = byte(w.bits)
@@ -125,7 +205,9 @@
 		n++
 	}
 	w.bits = 0
-	w.write(w.bytes[:n])
+	if n > 0 {
+		w.write(w.bytes[:n])
+	}
 	w.nbytes = 0
 }
 
@@ -136,30 +218,11 @@
 	_, w.err = w.writer.Write(b)
 }
 
-func (w *huffmanBitWriter) writeBits(b int32, nb uint) {
-	if w.err != nil {
-		return
-	}
-	w.bits |= uint64(b) << w.nbits
+func (w *huffmanBitWriter) writeBits(b int32, nb uint8) {
+	w.bits |= uint64(b) << (w.nbits & 63)
 	w.nbits += nb
 	if w.nbits >= 48 {
-		bits := w.bits
-		w.bits >>= 48
-		w.nbits -= 48
-		n := w.nbytes
-		bytes := w.bytes[n : n+6]
-		bytes[0] = byte(bits)
-		bytes[1] = byte(bits >> 8)
-		bytes[2] = byte(bits >> 16)
-		bytes[3] = byte(bits >> 24)
-		bytes[4] = byte(bits >> 32)
-		bytes[5] = byte(bits >> 40)
-		n += 6
-		if n >= bufferFlushSize {
-			w.write(w.bytes[:n])
-			n = 0
-		}
-		w.nbytes = n
+		w.writeOutBits()
 	}
 }
 
@@ -198,21 +261,23 @@
 //	numOffsets       The number of offsets in offsetEncoding
 //	litenc, offenc   The literal and offset encoder to use
 func (w *huffmanBitWriter) generateCodegen(numLiterals int, numOffsets int, litEnc, offEnc *huffmanEncoder) {
-	clear(w.codegenFreq[:])
+	for i := range w.codegenFreq {
+		w.codegenFreq[i] = 0
+	}
 	// Note that we are using codegen both as a temporary variable for holding
 	// a copy of the frequencies, and as the place where we put the result.
 	// This is fine because the output is always shorter than the input used
 	// so far.
-	codegen := w.codegen // cache
+	codegen := w.codegen[:] // cache
 	// Copy the concatenated code sizes to codegen. Put a marker at the end.
 	cgnl := codegen[:numLiterals]
 	for i := range cgnl {
-		cgnl[i] = uint8(litEnc.codes[i].len)
+		cgnl[i] = litEnc.codes[i].len()
 	}
 
 	cgnl = codegen[numLiterals : numLiterals+numOffsets]
 	for i := range cgnl {
-		cgnl[i] = uint8(offEnc.codes[i].len)
+		cgnl[i] = offEnc.codes[i].len()
 	}
 	codegen[numLiterals+numOffsets] = badCode
 
@@ -234,10 +299,7 @@
 			w.codegenFreq[size]++
 			count--
 			for count >= 3 {
-				n := 6
-				if n > count {
-					n = count
-				}
+				n := min(6, count)
 				codegen[outIndex] = 16
 				outIndex++
 				codegen[outIndex] = uint8(n - 3)
@@ -247,10 +309,7 @@
 			}
 		} else {
 			for count >= 11 {
-				n := 138
-				if n > count {
-					n = count
-				}
+				n := min(138, count)
 				codegen[outIndex] = 18
 				outIndex++
 				codegen[outIndex] = uint8(n - 11)
@@ -282,30 +341,61 @@
 	codegen[outIndex] = badCode
 }
 
-// dynamicSize returns the size of dynamically encoded data in bits.
-func (w *huffmanBitWriter) dynamicSize(litEnc, offEnc *huffmanEncoder, extraBits int) (size, numCodegens int) {
+func (w *huffmanBitWriter) codegens() int {
+	numCodegens := len(w.codegenFreq)
+	for numCodegens > 4 && w.codegenFreq[codegenOrder[numCodegens-1]] == 0 {
+		numCodegens--
+	}
+	return numCodegens
+}
+
+func (w *huffmanBitWriter) headerSize() (size, numCodegens int) {
 	numCodegens = len(w.codegenFreq)
 	for numCodegens > 4 && w.codegenFreq[codegenOrder[numCodegens-1]] == 0 {
 		numCodegens--
 	}
-	header := 3 + 5 + 5 + 4 + (3 * numCodegens) +
+	return 3 + 5 + 5 + 4 + (3 * numCodegens) +
 		w.codegenEncoding.bitLength(w.codegenFreq[:]) +
 		int(w.codegenFreq[16])*2 +
 		int(w.codegenFreq[17])*3 +
-		int(w.codegenFreq[18])*7
-	size = header +
-		litEnc.bitLength(w.literalFreq) +
-		offEnc.bitLength(w.offsetFreq) +
-		extraBits
+		int(w.codegenFreq[18])*7, numCodegens
+}
 
+// dynamicSize returns the size of dynamically encoded data in bits.
+func (w *huffmanBitWriter) dynamicReuseSize(litEnc, offEnc *huffmanEncoder) (size int) {
+	size = litEnc.bitLength(w.literalFreq[:]) +
+		offEnc.bitLength(w.offsetFreq[:])
+	return size
+}
+
+// dynamicSize returns the size of dynamically encoded data in bits.
+func (w *huffmanBitWriter) dynamicSize(litEnc, offEnc *huffmanEncoder, extraBits int) (size, numCodegens int) {
+	header, numCodegens := w.headerSize()
+	size = header +
+		litEnc.bitLength(w.literalFreq[:]) +
+		offEnc.bitLength(w.offsetFreq[:]) +
+		extraBits
 	return size, numCodegens
 }
 
+// extraBitSize will return the number of bits that will be written
+// as "extra" bits on matches.
+func (w *huffmanBitWriter) extraBitSize() int {
+	total := 0
+	for i, n := range w.literalFreq[257:literalCount] {
+		total += int(n) * int(lengthExtraBits[i&31])
+	}
+	for i, n := range w.offsetFreq[:offsetCodeCount] {
+		total += int(n) * int(offsetExtraBits[i&31])
+	}
+	return total
+}
+
 // fixedSize returns the size of dynamically encoded data in bits.
 func (w *huffmanBitWriter) fixedSize(extraBits int) int {
 	return 3 +
-		fixedLiteralEncoding.bitLength(w.literalFreq) +
-		fixedOffsetEncoding.bitLength(w.offsetFreq) +
+		fixedLiteralEncoding.bitLength(w.literalFreq[:]) +
+		fixedOffsetEncoding.bitLength(w.offsetFreq[:]) +
 		extraBits
 }
 
@@ -323,32 +413,37 @@
 }
 
 func (w *huffmanBitWriter) writeCode(c hcode) {
-	if w.err != nil {
-		return
-	}
-	w.bits |= uint64(c.code) << w.nbits
-	w.nbits += uint(c.len)
+	// The function does not get inlined if we "& 63" the shift.
+	w.bits |= c.code64() << (w.nbits & reg8SizeMask64)
+	w.nbits += c.len()
 	if w.nbits >= 48 {
-		bits := w.bits
-		w.bits >>= 48
-		w.nbits -= 48
-		n := w.nbytes
-		bytes := w.bytes[n : n+6]
-		bytes[0] = byte(bits)
-		bytes[1] = byte(bits >> 8)
-		bytes[2] = byte(bits >> 16)
-		bytes[3] = byte(bits >> 24)
-		bytes[4] = byte(bits >> 32)
-		bytes[5] = byte(bits >> 40)
-		n += 6
-		if n >= bufferFlushSize {
-			w.write(w.bytes[:n])
-			n = 0
-		}
-		w.nbytes = n
+		w.writeOutBits()
 	}
 }
 
+// writeOutBits will write bits to the buffer.
+func (w *huffmanBitWriter) writeOutBits() {
+	bits := w.bits
+	w.bits >>= 48
+	w.nbits -= 48
+	n := w.nbytes
+
+	// We overwrite, but faster...
+	storeLE64(w.bytes[n:], bits)
+	n += 6
+
+	if n >= bufferFlushSize {
+		if w.err != nil {
+			n = 0
+			return
+		}
+		w.write(w.bytes[:n])
+		n = 0
+	}
+
+	w.nbytes = n
+}
+
 // Write the header of a dynamic Huffman block to the output stream.
 //
 //	numLiterals  The number of literals specified in codegen
@@ -367,19 +462,19 @@
 	w.writeBits(int32(numOffsets-1), 5)
 	w.writeBits(int32(numCodegens-4), 4)
 
-	for i := 0; i < numCodegens; i++ {
-		value := uint(w.codegenEncoding.codes[codegenOrder[i]].len)
+	for i := range numCodegens {
+		value := uint(w.codegenEncoding.codes[codegenOrder[i]].len())
 		w.writeBits(int32(value), 3)
 	}
 
 	i := 0
 	for {
-		var codeWord int = int(w.codegen[i])
+		var codeWord = uint32(w.codegen[i])
 		i++
 		if codeWord == badCode {
 			break
 		}
-		w.writeCode(w.codegenEncoding.codes[uint32(codeWord)])
+		w.writeCode(w.codegenEncoding.codes[codeWord])
 
 		switch codeWord {
 		case 16:
@@ -395,10 +490,28 @@
 	}
 }
 
+// writeStoredHeader will write a stored header.
+// If the stored block is only used for EOF,
+// it is replaced with a fixed huffman block.
 func (w *huffmanBitWriter) writeStoredHeader(length int, isEof bool) {
 	if w.err != nil {
 		return
 	}
+	if w.lastHeader > 0 {
+		// We owe an EOB
+		w.writeCode(w.literalEncoding.codes[endBlockMarker])
+		w.lastHeader = 0
+	}
+
+	// To write EOF, use a fixed encoding block. 10 bits instead of 5 bytes.
+	if length == 0 && isEof {
+		w.writeFixedHeader(isEof)
+		// EOB: 7 bits, value: 0
+		w.writeBits(0, 7)
+		w.flush()
+		return
+	}
+
 	var flag int32
 	if isEof {
 		flag = 1
@@ -413,6 +526,12 @@
 	if w.err != nil {
 		return
 	}
+	if w.lastHeader > 0 {
+		// We owe an EOB
+		w.writeCode(w.literalEncoding.codes[endBlockMarker])
+		w.lastHeader = 0
+	}
+
 	// Indicate that we are a fixed Huffman block
 	var value int32 = 2
 	if isEof {
@@ -426,36 +545,33 @@
 // is larger than the original bytes, the data will be written as a
 // stored block.
 // If the input is nil, the tokens will always be Huffman encoded.
-func (w *huffmanBitWriter) writeBlock(tokens []token, eof bool, input []byte) {
+func (w *huffmanBitWriter) writeBlock(tokens *tokens, eof bool, input []byte) {
 	if w.err != nil {
 		return
 	}
 
-	tokens = append(tokens, endBlockMarker)
+	tokens.AddEOB()
+	if w.lastHeader > 0 {
+		// We owe an EOB
+		w.writeCode(w.literalEncoding.codes[endBlockMarker])
+		w.lastHeader = 0
+	}
 	numLiterals, numOffsets := w.indexTokens(tokens)
-
+	w.generate()
 	var extraBits int
 	storedSize, storable := w.storedSize(input)
 	if storable {
-		// We only bother calculating the costs of the extra bits required by
-		// the length of offset fields (which will be the same for both fixed
-		// and dynamic encoding), if we need to compare those two encodings
-		// against stored encoding.
-		for lengthCode := lengthCodesStart + 8; lengthCode < numLiterals; lengthCode++ {
-			// First eight length codes have extra size = 0.
-			extraBits += int(w.literalFreq[lengthCode]) * int(lengthExtraBits[lengthCode-lengthCodesStart])
-		}
-		for offsetCode := 4; offsetCode < numOffsets; offsetCode++ {
-			// First four offset codes have extra size = 0.
-			extraBits += int(w.offsetFreq[offsetCode]) * int(offsetExtraBits[offsetCode])
-		}
+		extraBits = w.extraBitSize()
 	}
 
 	// Figure out smallest code.
 	// Fixed Huffman baseline.
 	var literalEncoding = fixedLiteralEncoding
 	var offsetEncoding = fixedOffsetEncoding
-	var size = w.fixedSize(extraBits)
+	var size = math.MaxInt32
+	if tokens.n < maxPredefinedTokens {
+		size = w.fixedSize(extraBits)
+	}
 
 	// Dynamic Huffman?
 	var numCodegens int
@@ -473,7 +589,7 @@
 	}
 
 	// Stored bytes?
-	if storable && storedSize < size {
+	if storable && storedSize <= size {
 		w.writeStoredHeader(len(input), eof)
 		w.writeBytes(input)
 		return
@@ -487,7 +603,7 @@
 	}
 
 	// Write the tokens.
-	w.writeTokens(tokens, literalEncoding.codes, offsetEncoding.codes)
+	w.writeTokens(tokens.Slice(), literalEncoding.codes, offsetEncoding.codes)
 }
 
 // writeBlockDynamic encodes a block using a dynamic Huffman table.
@@ -495,53 +611,153 @@
 // histogram distribution.
 // If input is supplied and the compression savings are below 1/16th of the
 // input size the block is stored.
-func (w *huffmanBitWriter) writeBlockDynamic(tokens []token, eof bool, input []byte) {
+func (w *huffmanBitWriter) writeBlockDynamic(tokens *tokens, eof bool, input []byte, sync bool) {
 	if w.err != nil {
 		return
 	}
 
-	tokens = append(tokens, endBlockMarker)
-	numLiterals, numOffsets := w.indexTokens(tokens)
-
-	// Generate codegen and codegenFrequencies, which indicates how to encode
-	// the literalEncoding and the offsetEncoding.
-	w.generateCodegen(numLiterals, numOffsets, w.literalEncoding, w.offsetEncoding)
-	w.codegenEncoding.generate(w.codegenFreq[:], 7)
-	size, numCodegens := w.dynamicSize(w.literalEncoding, w.offsetEncoding, 0)
-
-	// Store bytes, if we don't get a reasonable improvement.
-	if ssize, storable := w.storedSize(input); storable && ssize < (size+size>>4) {
-		w.writeStoredHeader(len(input), eof)
-		w.writeBytes(input)
-		return
+	sync = sync || eof
+	if sync {
+		tokens.AddEOB()
 	}
 
-	// Write Huffman table.
-	w.writeDynamicHeader(numLiterals, numOffsets, numCodegens, eof)
+	// We cannot reuse pure huffman table, and must mark as EOF.
+	if (w.lastHuffMan || eof) && w.lastHeader > 0 {
+		// We will not try to reuse.
+		w.writeCode(w.literalEncoding.codes[endBlockMarker])
+		w.lastHeader = 0
+		w.lastHuffMan = false
+	}
 
+	if w.lastHeader > 0 && !w.canReuse(tokens) {
+		w.writeCode(w.literalEncoding.codes[endBlockMarker])
+		w.lastHeader = 0
+	}
+
+	numLiterals, numOffsets := w.indexTokens(tokens)
+	extraBits := 0
+	ssize, storable := w.storedSize(input)
+
+	if storable || w.lastHeader > 0 {
+		extraBits = w.extraBitSize()
+	}
+
+	var size int
+
+	// Check if we should reuse.
+	if w.lastHeader > 0 {
+		// Estimate size for using a new table.
+		// Use the previous header size as the best estimate.
+		newSize := w.lastHeader + tokens.EstimatedBits()
+
+		// The estimated size is calculated as an optimal table.
+		// We add a penalty to make it more realistic and re-use a bit more.
+		newSize += int(w.literalEncoding.codes[endBlockMarker].len()) + newSize>>w.logNewTablePenalty
+
+		// Calculate the size for reusing the current table.
+		reuseSize := w.dynamicReuseSize(w.literalEncoding, w.offsetEncoding) + extraBits
+
+		// Check if a new table is better.
+		if newSize < reuseSize {
+			// Write the EOB we owe.
+			w.writeCode(w.literalEncoding.codes[endBlockMarker])
+			size = newSize
+			w.lastHeader = 0
+		} else {
+			size = reuseSize
+		}
+
+		// Small blocks can be more efficient with fixed encoding.
+		if tokens.n < maxPredefinedTokens {
+			if preSize := w.fixedSize(extraBits) + 7; preSize < size {
+				// Check if we get a reasonable size decrease.
+				if storable && ssize <= size {
+					w.writeStoredHeader(len(input), eof)
+					w.writeBytes(input)
+					return
+				}
+				w.writeFixedHeader(eof)
+				if !sync {
+					tokens.AddEOB()
+				}
+				w.writeTokens(tokens.Slice(), fixedLiteralEncoding.codes, fixedOffsetEncoding.codes)
+				return
+			}
+		}
+
+		// Check if we get a reasonable size decrease.
+		if storable && ssize <= size {
+			w.writeStoredHeader(len(input), eof)
+			w.writeBytes(input)
+			return
+		}
+	}
+
+	// We want a new block/table
+	if w.lastHeader == 0 {
+		w.literalFreq[endBlockMarker] = 1
+
+		w.generate()
+		// Generate codegen and codegenFrequencies, which indicates how to encode
+		// the literalEncoding and the offsetEncoding.
+		w.generateCodegen(numLiterals, numOffsets, w.literalEncoding, w.offsetEncoding)
+		w.codegenEncoding.generate(w.codegenFreq[:], 7)
+
+		var numCodegens int
+		size, numCodegens = w.dynamicSize(w.literalEncoding, w.offsetEncoding, extraBits)
+
+		// Store predefined or raw, if we don't get a reasonable improvement.
+		if tokens.n < maxPredefinedTokens {
+			if preSize := w.fixedSize(extraBits); preSize <= size {
+				// Store bytes, if we don't get an improvement.
+				if storable && ssize <= preSize {
+					w.writeStoredHeader(len(input), eof)
+					w.writeBytes(input)
+					return
+				}
+				w.writeFixedHeader(eof)
+				if !sync {
+					tokens.AddEOB()
+				}
+				w.writeTokens(tokens.Slice(), fixedLiteralEncoding.codes, fixedOffsetEncoding.codes)
+				return
+			}
+		}
+
+		if storable && ssize <= size {
+			// Store bytes, if we don't get an improvement.
+			w.writeStoredHeader(len(input), eof)
+			w.writeBytes(input)
+			return
+		}
+
+		// Write Huffman table.
+		w.writeDynamicHeader(numLiterals, numOffsets, numCodegens, eof)
+		if !sync {
+			w.lastHeader, _ = w.headerSize()
+		}
+		w.lastHuffMan = false
+	}
+
+	if sync {
+		w.lastHeader = 0
+	}
 	// Write the tokens.
-	w.writeTokens(tokens, w.literalEncoding.codes, w.offsetEncoding.codes)
+	w.writeTokens(tokens.Slice(), w.literalEncoding.codes, w.offsetEncoding.codes)
 }
 
 // indexTokens indexes a slice of tokens, and updates
 // literalFreq and offsetFreq, and generates literalEncoding
 // and offsetEncoding.
 // The number of literal and offset tokens is returned.
-func (w *huffmanBitWriter) indexTokens(tokens []token) (numLiterals, numOffsets int) {
-	clear(w.literalFreq)
-	clear(w.offsetFreq)
+func (w *huffmanBitWriter) indexTokens(t *tokens) (numLiterals, numOffsets int) {
+	*(*[256]uint16)(w.literalFreq[:]) = t.litHist
+	*(*[32]uint16)(w.literalFreq[256:]) = t.extraHist
+	w.offsetFreq = t.offHist
 
-	for _, t := range tokens {
-		if t < matchType {
-			w.literalFreq[t.literal()]++
-			continue
-		}
-		length := t.length()
-		offset := t.offset()
-		w.literalFreq[lengthCodesStart+lengthCode(length)]++
-		w.offsetFreq[offsetCode(offset)]++
+	if t.n == 0 {
+		return
 	}
-
 	// get the number of literals
 	numLiterals = len(w.literalFreq)
 	for w.literalFreq[numLiterals-1] == 0 {
@@ -558,40 +774,152 @@
 		w.offsetFreq[0] = 1
 		numOffsets = 1
 	}
-	w.literalEncoding.generate(w.literalFreq, 15)
-	w.offsetEncoding.generate(w.offsetFreq, 15)
 	return
 }
 
+func (w *huffmanBitWriter) generate() {
+	w.literalEncoding.generate(w.literalFreq[:literalCount], 15)
+	w.offsetEncoding.generate(w.offsetFreq[:offsetCodeCount], 15)
+}
+
 // writeTokens writes a slice of tokens to the output.
 // codes for literal and offset encoding must be supplied.
 func (w *huffmanBitWriter) writeTokens(tokens []token, leCodes, oeCodes []hcode) {
 	if w.err != nil {
 		return
 	}
+	if len(tokens) == 0 {
+		return
+	}
+
+	// Only last token should be endBlockMarker.
+	var deferEOB bool
+	if tokens[len(tokens)-1] == endBlockMarker {
+		tokens = tokens[:len(tokens)-1]
+		deferEOB = true
+	}
+
+	// Create slices up to the next power of two to avoid bounds checks.
+	lits := leCodes[:256]
+	offs := oeCodes[:32]
+	lengths := leCodes[lengthCodesStart:]
+	lengths = lengths[:32]
+
+	// Go 1.16 LOVES having these on stack.
+	bits, nbits, nbytes := w.bits, w.nbits, w.nbytes
+
 	for _, t := range tokens {
-		if t < matchType {
-			w.writeCode(leCodes[t.literal()])
+		if t < 256 {
+			c := lits[t]
+			bits |= c.code64() << (nbits & 63)
+			nbits += c.len()
+			if nbits >= 48 {
+				storeLE64(w.bytes[nbytes:], bits)
+				bits >>= 48
+				nbits -= 48
+				nbytes += 6
+				if nbytes >= bufferFlushSize {
+					if w.err != nil {
+						nbytes = 0
+						return
+					}
+					_, w.err = w.writer.Write(w.bytes[:nbytes])
+					nbytes = 0
+				}
+			}
 			continue
 		}
+
 		// Write the length
 		length := t.length()
-		lengthCode := lengthCode(length)
-		w.writeCode(leCodes[lengthCode+lengthCodesStart])
-		extraLengthBits := uint(lengthExtraBits[lengthCode])
-		if extraLengthBits > 0 {
-			extraLength := int32(length - lengthBase[lengthCode])
-			w.writeBits(extraLength, extraLengthBits)
+		lenCode := lengthCode(length) & 31
+		// inlined 'w.writeCode(lengths[lengthCode])'
+		c := lengths[lenCode]
+		bits |= c.code64() << (nbits & 63)
+		nbits += c.len()
+		if nbits >= 48 {
+			storeLE64(w.bytes[nbytes:], bits)
+			bits >>= 48
+			nbits -= 48
+			nbytes += 6
+			if nbytes >= bufferFlushSize {
+				if w.err != nil {
+					nbytes = 0
+					return
+				}
+				_, w.err = w.writer.Write(w.bytes[:nbytes])
+				nbytes = 0
+			}
+		}
+
+		if lenCode >= lengthExtraBitsMinCode {
+			extraLengthBits := lengthExtraBits[lenCode]
+			//w.writeBits(extraLength, extraLengthBits)
+			extraLength := int32(length - lengthBase[lenCode])
+			bits |= uint64(extraLength) << (nbits & 63)
+			nbits += extraLengthBits
+			if nbits >= 48 {
+				storeLE64(w.bytes[nbytes:], bits)
+				bits >>= 48
+				nbits -= 48
+				nbytes += 6
+				if nbytes >= bufferFlushSize {
+					if w.err != nil {
+						nbytes = 0
+						return
+					}
+					_, w.err = w.writer.Write(w.bytes[:nbytes])
+					nbytes = 0
+				}
+			}
 		}
 		// Write the offset
 		offset := t.offset()
-		offsetCode := offsetCode(offset)
-		w.writeCode(oeCodes[offsetCode])
-		extraOffsetBits := uint(offsetExtraBits[offsetCode])
-		if extraOffsetBits > 0 {
-			extraOffset := int32(offset - offsetBase[offsetCode])
-			w.writeBits(extraOffset, extraOffsetBits)
+		offCode := (offset >> 16) & 31
+		// inlined 'w.writeCode(offs[offCode])'
+		c = offs[offCode]
+		bits |= c.code64() << (nbits & 63)
+		nbits += c.len()
+		if nbits >= 48 {
+			storeLE64(w.bytes[nbytes:], bits)
+			bits >>= 48
+			nbits -= 48
+			nbytes += 6
+			if nbytes >= bufferFlushSize {
+				if w.err != nil {
+					nbytes = 0
+					return
+				}
+				_, w.err = w.writer.Write(w.bytes[:nbytes])
+				nbytes = 0
+			}
 		}
+
+		if offCode >= offsetExtraBitsMinCode {
+			offsetComb := offsetCombined[offCode]
+			bits |= uint64((offset-(offsetComb>>8))&matchOffsetOnlyMask) << (nbits & 63)
+			nbits += uint8(offsetComb)
+			if nbits >= 48 {
+				storeLE64(w.bytes[nbytes:], bits)
+				bits >>= 48
+				nbits -= 48
+				nbytes += 6
+				if nbytes >= bufferFlushSize {
+					if w.err != nil {
+						nbytes = 0
+						return
+					}
+					_, w.err = w.writer.Write(w.bytes[:nbytes])
+					nbytes = 0
+				}
+			}
+		}
+	}
+	// Restore...
+	w.bits, w.nbits, w.nbytes = bits, nbits, nbytes
+
+	if deferEOB {
+		w.writeCode(leCodes[endBlockMarker])
 	}
 }
 
@@ -600,94 +928,168 @@
 var huffOffset *huffmanEncoder
 
 func init() {
-	offsetFreq := make([]int32, offsetCodeCount)
-	offsetFreq[0] = 1
+	w := newHuffmanBitWriter(nil)
+	w.offsetFreq[0] = 1
 	huffOffset = newHuffmanEncoder(offsetCodeCount)
-	huffOffset.generate(offsetFreq, 15)
+	huffOffset.generate(w.offsetFreq[:offsetCodeCount], 15)
 }
 
 // writeBlockHuff encodes a block of bytes as either
 // Huffman encoded literals or uncompressed bytes if the
 // results only gains very little from compression.
-func (w *huffmanBitWriter) writeBlockHuff(eof bool, input []byte) {
+func (w *huffmanBitWriter) writeBlockHuff(eof bool, input []byte, sync bool) {
 	if w.err != nil {
 		return
 	}
 
 	// Clear histogram
-	clear(w.literalFreq)
-
-	// Add everything as literals
-	histogram(input, w.literalFreq)
-
-	w.literalFreq[endBlockMarker] = 1
+	for i := range w.literalFreq[:] {
+		w.literalFreq[i] = 0
+	}
+	if !w.lastHuffMan {
+		for i := range w.offsetFreq[:] {
+			w.offsetFreq[i] = 0
+		}
+	}
 
 	const numLiterals = endBlockMarker + 1
-	w.offsetFreq[0] = 1
 	const numOffsets = 1
 
-	w.literalEncoding.generate(w.literalFreq, 15)
-
-	// Figure out smallest code.
-	// Always use dynamic Huffman or Store
-	var numCodegens int
-
-	// Generate codegen and codegenFrequencies, which indicates how to encode
-	// the literalEncoding and the offsetEncoding.
-	w.generateCodegen(numLiterals, numOffsets, w.literalEncoding, huffOffset)
-	w.codegenEncoding.generate(w.codegenFreq[:], 7)
-	size, numCodegens := w.dynamicSize(w.literalEncoding, huffOffset, 0)
+	// Add everything as literals
+	// We have to estimate the header size.
+	// Assume header is around 70 bytes:
+	// https://stackoverflow.com/a/25454430
+	const guessHeaderSizeBits = 70 * 8
+	histogram(input, w.literalFreq[:numLiterals])
+	ssize, storable := w.storedSize(input)
+	if storable && len(input) > 1024 {
+		// Quick check for incompressible content.
+		abs := float64(0)
+		avg := float64(len(input)) / 256
+		max := float64(len(input) * 2)
+		for _, v := range w.literalFreq[:256] {
+			diff := float64(v) - avg
+			abs += diff * diff
+			if abs > max {
+				break
+			}
+		}
+		if abs < max {
+			// No chance we can compress this...
+			w.writeStoredHeader(len(input), eof)
+			w.writeBytes(input)
+			return
+		}
+	}
+	w.literalFreq[endBlockMarker] = 1
+	w.tmpLitEncoding.generate(w.literalFreq[:numLiterals], 15)
+	estBits := w.tmpLitEncoding.canReuseBits(w.literalFreq[:numLiterals])
+	if estBits < math.MaxInt32 {
+		estBits += w.lastHeader
+		if w.lastHeader == 0 {
+			estBits += guessHeaderSizeBits
+		}
+		estBits += estBits >> w.logNewTablePenalty
+	}
 
 	// Store bytes, if we don't get a reasonable improvement.
-	if ssize, storable := w.storedSize(input); storable && ssize < (size+size>>4) {
+	if storable && ssize <= estBits {
 		w.writeStoredHeader(len(input), eof)
 		w.writeBytes(input)
 		return
 	}
 
-	// Huffman.
-	w.writeDynamicHeader(numLiterals, numOffsets, numCodegens, eof)
-	encoding := w.literalEncoding.codes[:257]
-	n := w.nbytes
+	if w.lastHeader > 0 {
+		reuseSize := w.literalEncoding.canReuseBits(w.literalFreq[:256])
+
+		if estBits < reuseSize {
+			// We owe an EOB
+			w.writeCode(w.literalEncoding.codes[endBlockMarker])
+			w.lastHeader = 0
+		}
+	}
+
+	if w.lastHeader == 0 {
+		// Use the temp encoding, so swap.
+		w.literalEncoding, w.tmpLitEncoding = w.tmpLitEncoding, w.literalEncoding
+		// Generate codegen and codegenFrequencies, which indicates how to encode
+		// the literalEncoding and the offsetEncoding.
+		w.generateCodegen(numLiterals, numOffsets, w.literalEncoding, huffOffset)
+		w.codegenEncoding.generate(w.codegenFreq[:], 7)
+		numCodegens := w.codegens()
+
+		// Huffman.
+		w.writeDynamicHeader(numLiterals, numOffsets, numCodegens, eof)
+		w.lastHuffMan = true
+		w.lastHeader, _ = w.headerSize()
+	}
+
+	encoding := w.literalEncoding.codes[:256]
+	// Go 1.16 LOVES having these on stack. At least 1.5x the speed.
+	bits, nbits, nbytes := w.bits, w.nbits, w.nbytes
+
+	// Unroll, write 3 codes/loop.
+	// Fastest number of unrolls.
+	for len(input) > 3 {
+		// We must have at least 48 bits free.
+		if nbits >= 8 {
+			n := nbits >> 3
+			storeLE64(w.bytes[nbytes:], bits)
+			bits >>= (n * 8) & 63
+			nbits -= n * 8
+			nbytes += n
+		}
+		if nbytes >= bufferFlushSize {
+			if w.err != nil {
+				nbytes = 0
+				return
+			}
+			_, w.err = w.writer.Write(w.bytes[:nbytes])
+			nbytes = 0
+		}
+		a, b := encoding[input[0]], encoding[input[1]]
+		bits |= a.code64() << (nbits & 63)
+		bits |= b.code64() << ((nbits + a.len()) & 63)
+		c := encoding[input[2]]
+		nbits += b.len() + a.len()
+		bits |= c.code64() << (nbits & 63)
+		nbits += c.len()
+		input = input[3:]
+	}
+
+	// Remaining...
 	for _, t := range input {
+		if nbits >= 48 {
+			storeLE64(w.bytes[nbytes:], bits)
+			bits >>= 48
+			nbits -= 48
+			nbytes += 6
+			if nbytes >= bufferFlushSize {
+				if w.err != nil {
+					nbytes = 0
+					return
+				}
+				_, w.err = w.writer.Write(w.bytes[:nbytes])
+				nbytes = 0
+			}
+		}
 		// Bitwriting inlined, ~30% speedup
 		c := encoding[t]
-		w.bits |= uint64(c.code) << w.nbits
-		w.nbits += uint(c.len)
-		if w.nbits < 48 {
-			continue
-		}
-		// Store 6 bytes
-		bits := w.bits
-		w.bits >>= 48
-		w.nbits -= 48
-		bytes := w.bytes[n : n+6]
-		bytes[0] = byte(bits)
-		bytes[1] = byte(bits >> 8)
-		bytes[2] = byte(bits >> 16)
-		bytes[3] = byte(bits >> 24)
-		bytes[4] = byte(bits >> 32)
-		bytes[5] = byte(bits >> 40)
-		n += 6
-		if n < bufferFlushSize {
-			continue
-		}
-		w.write(w.bytes[:n])
-		if w.err != nil {
-			return // Return early in the event of write failures
-		}
-		n = 0
-	}
-	w.nbytes = n
-	w.writeCode(encoding[endBlockMarker])
-}
+		bits |= c.code64() << (nbits & 63)
 
-// histogram accumulates a histogram of b in h.
-//
-// len(h) must be >= 256, and h's elements must be all zeroes.
-func histogram(b []byte, h []int32) {
-	h = h[:256]
-	for _, t := range b {
-		h[t]++
+		nbits += c.len()
+	}
+	// Restore...
+	w.bits, w.nbits, w.nbytes = bits, nbits, nbytes
+
+	// Flush if needed to have space.
+	if w.nbits >= 48 {
+		w.writeOutBits()
+	}
+
+	if eof || sync {
+		w.writeCode(w.literalEncoding.codes[endBlockMarker])
+		w.lastHeader = 0
+		w.lastHuffMan = false
 	}
 }
diff --git a/src/compress/flate/huffman_bit_writer_test.go b/src/compress/flate/huffman_bit_writer_test.go
index a57799c..dfb93e3 100644
--- a/src/compress/flate/huffman_bit_writer_test.go
+++ b/src/compress/flate/huffman_bit_writer_test.go
@@ -32,7 +32,9 @@
 		if strings.HasSuffix(in, ".in") {
 			out = in[:len(in)-len(".in")] + ".golden"
 		}
-		testBlockHuff(t, in, out)
+		t.Run(in, func(t *testing.T) {
+			testBlockHuff(t, in, out)
+		})
 	}
 }
 
@@ -44,7 +46,8 @@
 	}
 	var buf bytes.Buffer
 	bw := newHuffmanBitWriter(&buf)
-	bw.writeBlockHuff(false, all)
+	bw.logNewTablePenalty = 8
+	bw.writeBlockHuff(false, all, false)
 	bw.flush()
 	got := buf.Bytes()
 
@@ -79,7 +82,7 @@
 	// Test if the writer produces the same output after reset.
 	buf.Reset()
 	bw.reset(&buf)
-	bw.writeBlockHuff(false, all)
+	bw.writeBlockHuff(false, all, false)
 	bw.flush()
 	got = buf.Bytes()
 	if !bytes.Equal(got, want) {
@@ -175,13 +178,23 @@
 	}
 }
 
+// TestWriteBlockDynamic tests if the writeBlockDynamic encoding has changed.
+// To update the reference files use the "-update" flag on the test.
+func TestWriteBlockDynamicSync(t *testing.T) {
+	for _, test := range writeBlockTests {
+		testBlock(t, test, "sync")
+	}
+}
+
 // testBlock tests a block against its references,
 // or regenerate the references, if "-update" flag is set.
 func testBlock(t *testing.T, test huffTest, ttype string) {
 	if test.want != "" {
 		test.want = fmt.Sprintf(test.want, ttype)
 	}
+	const gotSuffix = ".got"
 	test.wantNoInput = fmt.Sprintf(test.wantNoInput, ttype)
+	tokens := indexTokens(test.tokens)
 	if *update {
 		if test.input != "" {
 			t.Logf("Updating %q", test.want)
@@ -198,7 +211,7 @@
 			}
 			defer f.Close()
 			bw := newHuffmanBitWriter(f)
-			writeToType(t, ttype, bw, test.tokens, input)
+			writeToType(t, ttype, bw, tokens, input)
 		}
 
 		t.Logf("Updating %q", test.wantNoInput)
@@ -209,7 +222,7 @@
 		}
 		defer f.Close()
 		bw := newHuffmanBitWriter(f)
-		writeToType(t, ttype, bw, test.tokens, nil)
+		writeToType(t, ttype, bw, tokens, nil)
 		return
 	}
 
@@ -227,12 +240,12 @@
 		}
 		var buf bytes.Buffer
 		bw := newHuffmanBitWriter(&buf)
-		writeToType(t, ttype, bw, test.tokens, input)
+		writeToType(t, ttype, bw, tokens, input)
 
 		got := buf.Bytes()
 		if !bytes.Equal(got, want) {
-			t.Errorf("writeBlock did not yield expected result for file %q with input. See %q", test.want, test.want+".got")
-			if err := os.WriteFile(test.want+".got", got, 0666); err != nil {
+			t.Errorf("writeBlock did not yield expected result for file %q with input. See %q", test.want, test.want+gotSuffix)
+			if err := os.WriteFile(test.want+gotSuffix, got, 0666); err != nil {
 				t.Error(err)
 			}
 		}
@@ -241,12 +254,12 @@
 		// Test if the writer produces the same output after reset.
 		buf.Reset()
 		bw.reset(&buf)
-		writeToType(t, ttype, bw, test.tokens, input)
+		writeToType(t, ttype, bw, tokens, input)
 		bw.flush()
 		got = buf.Bytes()
 		if !bytes.Equal(got, want) {
-			t.Errorf("reset: writeBlock did not yield expected result for file %q with input. See %q", test.want, test.want+".reset.got")
-			if err := os.WriteFile(test.want+".reset.got", got, 0666); err != nil {
+			t.Errorf("reset: writeBlock did not yield expected result for file %q with input. See %q", test.want, test.want+".reset"+gotSuffix)
+			if err := os.WriteFile(test.want+".reset"+gotSuffix, got, 0666); err != nil {
 				t.Error(err)
 			}
 			return
@@ -262,12 +275,12 @@
 	}
 	var buf bytes.Buffer
 	bw := newHuffmanBitWriter(&buf)
-	writeToType(t, ttype, bw, test.tokens, nil)
+	writeToType(t, ttype, bw, tokens, nil)
 
 	got := buf.Bytes()
 	if !bytes.Equal(got, wantNI) {
-		t.Errorf("writeBlock did not yield expected result for file %q with input. See %q", test.wantNoInput, test.wantNoInput+".got")
-		if err := os.WriteFile(test.want+".got", got, 0666); err != nil {
+		t.Errorf("writeBlock did not yield expected result for file %q with input. See %q", test.wantNoInput, test.wantNoInput+gotSuffix)
+		if err := os.WriteFile(test.wantNoInput+gotSuffix, got, 0666); err != nil {
 			t.Error(err)
 		}
 	} else if got[0]&1 == 1 {
@@ -280,12 +293,12 @@
 	// Test if the writer produces the same output after reset.
 	buf.Reset()
 	bw.reset(&buf)
-	writeToType(t, ttype, bw, test.tokens, nil)
+	writeToType(t, ttype, bw, tokens, nil)
 	bw.flush()
 	got = buf.Bytes()
 	if !bytes.Equal(got, wantNI) {
-		t.Errorf("reset: writeBlock did not yield expected result for file %q without input. See %q", test.want, test.want+".reset.got")
-		if err := os.WriteFile(test.want+".reset.got", got, 0666); err != nil {
+		t.Errorf("reset: writeBlock did not yield expected result for file %q without input. See %q", test.wantNoInput, test.wantNoInput+".reset"+gotSuffix)
+		if err := os.WriteFile(test.wantNoInput+".reset"+gotSuffix, got, 0666); err != nil {
 			t.Error(err)
 		}
 		return
@@ -294,12 +307,14 @@
 	testWriterEOF(t, "wb", test, false)
 }
 
-func writeToType(t *testing.T, ttype string, bw *huffmanBitWriter, tok []token, input []byte) {
+func writeToType(t *testing.T, ttype string, bw *huffmanBitWriter, tok tokens, input []byte) {
 	switch ttype {
 	case "wb":
-		bw.writeBlock(tok, false, input)
+		bw.writeBlock(&tok, false, input)
 	case "dyn":
-		bw.writeBlockDynamic(tok, false, input)
+		bw.writeBlockDynamic(&tok, false, input, false)
+	case "sync":
+		bw.writeBlockDynamic(&tok, false, input, true)
 	default:
 		panic("unknown test type")
 	}
@@ -332,13 +347,14 @@
 	}
 	var buf bytes.Buffer
 	bw := newHuffmanBitWriter(&buf)
+	tokens := indexTokens(test.tokens)
 	switch ttype {
 	case "wb":
-		bw.writeBlock(test.tokens, true, input)
+		bw.writeBlock(&tokens, true, input)
 	case "dyn":
-		bw.writeBlockDynamic(test.tokens, true, input)
+		bw.writeBlockDynamic(&tokens, true, input, true)
 	case "huff":
-		bw.writeBlockHuff(true, input)
+		bw.writeBlockHuff(true, input, true)
 	default:
 		panic("unknown test type")
 	}
diff --git a/src/compress/flate/huffman_code.go b/src/compress/flate/huffman_code.go
index 6f69cab..f3e2024 100644
--- a/src/compress/flate/huffman_code.go
+++ b/src/compress/flate/huffman_code.go
@@ -7,25 +7,42 @@
 import (
 	"math"
 	"math/bits"
-	"sort"
+)
+
+const (
+	maxBitsLimit = 16
+	// number of valid literals
+	literalCount = 286
 )
 
 // hcode is a huffman code with a bit code and bit length.
-type hcode struct {
-	code, len uint16
+type hcode uint32
+
+func (h hcode) len() uint8 {
+	return uint8(h)
+}
+
+func (h hcode) code64() uint64 {
+	return uint64(h >> 8)
+}
+
+func (h hcode) zero() bool {
+	return h == 0
 }
 
 type huffmanEncoder struct {
-	codes     []hcode
-	freqcache []literalNode
-	bitCount  [17]int32
-	lns       byLiteral // stored to avoid repeated allocation in generate
-	lfs       byFreq    // stored to avoid repeated allocation in generate
+	codes    []hcode
+	bitCount [17]int32
+
+	// Allocate a reusable buffer with the longest possible frequency table.
+	// Possible lengths are codegenCodeCount, offsetCodeCount and literalCount.
+	// The largest of these is literalCount, so we allocate for that case.
+	freqcache [literalCount + 1]literalNode
 }
 
 type literalNode struct {
 	literal uint16
-	freq    int32
+	freq    uint16
 }
 
 // A levelInfo describes the state of the constructed tree for a given depth.
@@ -49,25 +66,34 @@
 }
 
 // set sets the code and length of an hcode.
-func (h *hcode) set(code uint16, length uint16) {
-	h.len = length
-	h.code = code
+func (h *hcode) set(code uint16, length uint8) {
+	*h = hcode(length) | (hcode(code) << 8)
 }
 
-func maxNode() literalNode { return literalNode{math.MaxUint16, math.MaxInt32} }
+func newhcode(code uint16, length uint8) hcode {
+	return hcode(length) | (hcode(code) << 8)
+}
+
+func reverseBits(number uint16, bitLength byte) uint16 {
+	return bits.Reverse16(number << ((16 - bitLength) & 15))
+}
+
+func maxNode() literalNode { return literalNode{math.MaxUint16, math.MaxUint16} }
 
 func newHuffmanEncoder(size int) *huffmanEncoder {
-	return &huffmanEncoder{codes: make([]hcode, size)}
+	// Make capacity to next power of two.
+	c := uint(bits.Len32(uint32(size - 1)))
+	return &huffmanEncoder{codes: make([]hcode, size, 1<<c)}
 }
 
-// Generates a HuffmanCode corresponding to the fixed literal table.
+// Generates a HuffmanCode corresponding to the fixed literal table
 func generateFixedLiteralEncoding() *huffmanEncoder {
-	h := newHuffmanEncoder(maxNumLit)
+	h := newHuffmanEncoder(literalCount)
 	codes := h.codes
 	var ch uint16
-	for ch = 0; ch < maxNumLit; ch++ {
+	for ch = range uint16(literalCount) {
 		var bits uint16
-		var size uint16
+		var size uint8
 		switch {
 		case ch < 144:
 			// size 8, 000110000  .. 10111111
@@ -86,7 +112,7 @@
 			bits = ch + 192 - 280
 			size = 8
 		}
-		codes[ch] = hcode{code: reverseBits(bits, byte(size)), len: size}
+		codes[ch] = newhcode(reverseBits(bits, size), size)
 	}
 	return h
 }
@@ -95,40 +121,65 @@
 	h := newHuffmanEncoder(30)
 	codes := h.codes
 	for ch := range codes {
-		codes[ch] = hcode{code: reverseBits(uint16(ch), 5), len: 5}
+		codes[ch] = newhcode(reverseBits(uint16(ch), 5), 5)
 	}
 	return h
 }
 
-var fixedLiteralEncoding *huffmanEncoder = generateFixedLiteralEncoding()
-var fixedOffsetEncoding *huffmanEncoder = generateFixedOffsetEncoding()
+var fixedLiteralEncoding = generateFixedLiteralEncoding()
+var fixedOffsetEncoding = generateFixedOffsetEncoding()
 
-func (h *huffmanEncoder) bitLength(freq []int32) int {
+func (h *huffmanEncoder) bitLength(freq []uint16) int {
 	var total int
 	for i, f := range freq {
 		if f != 0 {
-			total += int(f) * int(h.codes[i].len)
+			total += int(f) * int(h.codes[i].len())
 		}
 	}
 	return total
 }
 
-const maxBitsLimit = 16
+func (h *huffmanEncoder) bitLengthRaw(b []byte) int {
+	var total int
+	for _, f := range b {
+		total += int(h.codes[f].len())
+	}
+	return total
+}
 
-// bitCounts computes the number of literals assigned to each bit size in the Huffman encoding.
-// It is only called when list.length >= 3.
+// canReuseBits returns the number of bits or math.MaxInt32 if the encoder cannot be reused.
+func (h *huffmanEncoder) canReuseBits(freq []uint16) int {
+	var total int
+	for i, f := range freq {
+		if f != 0 {
+			code := h.codes[i]
+			if code.zero() {
+				return math.MaxInt32
+			}
+			total += int(f) * int(code.len())
+		}
+	}
+	return total
+}
+
+// Return the number of literals assigned to each bit size in the Huffman encoding
+//
+// This method is only called when list.length >= 3
 // The cases of 0, 1, and 2 literals are handled by special case code.
 //
-// list is an array of the literals with non-zero frequencies
-// and their associated frequencies. The array is in order of increasing
-// frequency and has as its last element a special element with frequency
-// MaxInt32.
+// list  An array of the literals with non-zero frequencies
 //
-// maxBits is the maximum number of bits that should be used to encode any literal.
-// It must be less than 16.
+//	and their associated frequencies. The array is in order of increasing
+//	frequency, and has as its last element a special element with frequency
+//	MaxInt32
 //
-// bitCounts returns an integer slice in which slice[i] indicates the number of literals
-// that should be encoded in i bits.
+// maxBits     The maximum number of bits that should be used to encode any literal.
+//
+//	Must be less than 16.
+//
+// return      An integer array in which array[i] indicates the number of literals
+//
+//	that should be encoded in i bits.
 func (h *huffmanEncoder) bitCounts(list []literalNode, maxBits int32) []int32 {
 	if maxBits >= maxBitsLimit {
 		panic("flate: maxBits too large")
@@ -154,14 +205,19 @@
 	// of the level j ancestor.
 	var leafCounts [maxBitsLimit][maxBitsLimit]int32
 
+	// Descending to only have 1 bounds check.
+	l2f := int32(list[2].freq)
+	l1f := int32(list[1].freq)
+	l0f := int32(list[0].freq) + int32(list[1].freq)
+
 	for level := int32(1); level <= maxBits; level++ {
 		// For every level, the first two items are the first two characters.
 		// We initialize the levels as if we had already figured this out.
 		levels[level] = levelInfo{
 			level:        level,
-			lastFreq:     list[1].freq,
-			nextCharFreq: list[2].freq,
-			nextPairFreq: list[0].freq + list[1].freq,
+			lastFreq:     l1f,
+			nextCharFreq: l2f,
+			nextPairFreq: l0f,
 		}
 		leafCounts[level][level] = 2
 		if level == 1 {
@@ -172,11 +228,11 @@
 	// We need a total of 2*n - 2 items at top level and have already generated 2.
 	levels[maxBits].needed = 2*n - 4
 
-	level := maxBits
-	for {
+	level := uint32(maxBits)
+	for level < 16 {
 		l := &levels[level]
 		if l.nextPairFreq == math.MaxInt32 && l.nextCharFreq == math.MaxInt32 {
-			// We've run out of both leaves and pairs.
+			// We've run out of both leafs and pairs.
 			// End all calculations for this level.
 			// To make sure we never come back to this level or any lower level,
 			// set nextPairFreq impossibly large.
@@ -193,14 +249,21 @@
 			l.lastFreq = l.nextCharFreq
 			// Lower leafCounts are the same of the previous node.
 			leafCounts[level][level] = n
-			l.nextCharFreq = list[n].freq
+			e := list[n]
+			if e.literal < math.MaxUint16 {
+				l.nextCharFreq = int32(e.freq)
+			} else {
+				l.nextCharFreq = math.MaxInt32
+			}
 		} else {
 			// The next item on this row is a pair from the previous row.
 			// nextPairFreq isn't valid until we generate two
 			// more values in the level below
 			l.lastFreq = l.nextPairFreq
 			// Take leaf counts from the lower level, except counts[level] remains the same.
-			copy(leafCounts[level][:level], leafCounts[level-1][:level])
+			save := leafCounts[level][level]
+			leafCounts[level] = leafCounts[level-1]
+			leafCounts[level][level] = save
 			levels[l.level-1].needed = 2
 		}
 
@@ -256,9 +319,9 @@
 		// assigned in literal order (not frequency order).
 		chunk := list[len(list)-int(bits):]
 
-		h.lns.sort(chunk)
+		sortByLiteral(chunk)
 		for _, node := range chunk {
-			h.codes[node.literal] = hcode{code: reverseBits(code, uint8(n)), len: uint16(n)}
+			h.codes[node.literal] = newhcode(reverseBits(code, uint8(n)), uint8(n))
 			code++
 		}
 		list = list[0 : len(list)-int(bits)]
@@ -268,15 +331,10 @@
 // Update this Huffman Code object to be the minimum code for the specified frequency count.
 //
 // freq is an array of frequencies, in which freq[i] gives the frequency of literal i.
-// maxBits  The maximum number of bits to use for any literal.
-func (h *huffmanEncoder) generate(freq []int32, maxBits int32) {
-	if h.freqcache == nil {
-		// Allocate a reusable buffer with the longest possible frequency table.
-		// Possible lengths are codegenCodeCount, offsetCodeCount and maxNumLit.
-		// The largest of these is maxNumLit, so we allocate for that case.
-		h.freqcache = make([]literalNode, maxNumLit+1)
-	}
+// maxBits is the maximum number of bits to use for any literal.
+func (h *huffmanEncoder) generate(freq []uint16, maxBits int32) {
 	list := h.freqcache[:len(freq)+1]
+	codes := h.codes[:len(freq)]
 	// Number of non-zero literals
 	count := 0
 	// Set list to be the set of all non-zero literals and their frequencies
@@ -285,9 +343,10 @@
 			list[count] = literalNode{uint16(i), f}
 			count++
 		} else {
-			h.codes[i].len = 0
+			codes[i] = 0
 		}
 	}
+	list[count] = literalNode{}
 
 	list = list[:count]
 	if count <= 2 {
@@ -299,7 +358,7 @@
 		}
 		return
 	}
-	h.lfs.sort(list)
+	sortByFreq(list)
 
 	// Get the number of literals for each bit count
 	bitCount := h.bitCounts(list, maxBits)
@@ -307,39 +366,43 @@
 	h.assignEncodingAndSize(bitCount, list)
 }
 
-type byLiteral []literalNode
-
-func (s *byLiteral) sort(a []literalNode) {
-	*s = byLiteral(a)
-	sort.Sort(s)
+// atLeastOne clamps the result between 1 and 15.
+func atLeastOne(v float32) float32 {
+	return min(15, max(1, v))
 }
 
-func (s byLiteral) Len() int { return len(s) }
-
-func (s byLiteral) Less(i, j int) bool {
-	return s[i].literal < s[j].literal
-}
-
-func (s byLiteral) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
-
-type byFreq []literalNode
-
-func (s *byFreq) sort(a []literalNode) {
-	*s = byFreq(a)
-	sort.Sort(s)
-}
-
-func (s byFreq) Len() int { return len(s) }
-
-func (s byFreq) Less(i, j int) bool {
-	if s[i].freq == s[j].freq {
-		return s[i].literal < s[j].literal
+func histogram(b []byte, h []uint16) {
+	if len(b) >= 8<<10 {
+		// Split for bigger inputs
+		histogramSplit(b, h)
+	} else {
+		h = h[:256]
+		for _, t := range b {
+			h[t]++
+		}
 	}
-	return s[i].freq < s[j].freq
 }
 
-func (s byFreq) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
-
-func reverseBits(number uint16, bitLength byte) uint16 {
-	return bits.Reverse16(number << (16 - bitLength))
+func histogramSplit(b []byte, h []uint16) {
+	// Tested, and slightly faster than 2-way.
+	// Writing to separate arrays and combining is also slightly slower.
+	h = h[:256]
+	// Make size divisible by 4
+	for len(b)&3 != 0 {
+		h[b[0]]++
+		b = b[1:]
+	}
+	n := len(b) / 4
+	x, y, z, w := b[:n], b[n:], b[n+n:], b[n+n+n:]
+	y, z, w = y[:len(x)], z[:len(x)], w[:len(x)]
+	for i, t := range x {
+		v0 := &h[t]
+		v1 := &h[y[i]]
+		v3 := &h[w[i]]
+		v2 := &h[z[i]]
+		*v0++
+		*v1++
+		*v2++
+		*v3++
+	}
 }
diff --git a/src/compress/flate/huffman_sortByFreq.go b/src/compress/flate/huffman_sortByFreq.go
new file mode 100644
index 0000000..6c05ba8
--- /dev/null
+++ b/src/compress/flate/huffman_sortByFreq.go
@@ -0,0 +1,159 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+// Sort sorts data.
+// It makes one call to data.Len to determine n, and O(n*log(n)) calls to
+// data.Less and data.Swap. The sort is not guaranteed to be stable.
+func sortByFreq(data []literalNode) {
+	n := len(data)
+	quickSortByFreq(data, 0, n, maxDepth(n))
+}
+
+func quickSortByFreq(data []literalNode, a, b, maxDepth int) {
+	for b-a > 12 { // Use ShellSort for slices <= 12 elements
+		if maxDepth == 0 {
+			heapSort(data, a, b)
+			return
+		}
+		maxDepth--
+		mlo, mhi := doPivotByFreq(data, a, b)
+		// Avoiding recursion on the larger subproblem guarantees
+		// a stack depth of at most lg(b-a).
+		if mlo-a < b-mhi {
+			quickSortByFreq(data, a, mlo, maxDepth)
+			a = mhi // i.e., quickSortByFreq(data, mhi, b)
+		} else {
+			quickSortByFreq(data, mhi, b, maxDepth)
+			b = mlo // i.e., quickSortByFreq(data, a, mlo)
+		}
+	}
+	if b-a > 1 {
+		// Do ShellSort pass with gap 6
+		// It could be written in this simplified form cause b-a <= 12
+		for i := a + 6; i < b; i++ {
+			if data[i].freq == data[i-6].freq && data[i].literal < data[i-6].literal || data[i].freq < data[i-6].freq {
+				data[i], data[i-6] = data[i-6], data[i]
+			}
+		}
+		insertionSortByFreq(data, a, b)
+	}
+}
+
+func doPivotByFreq(data []literalNode, lo, hi int) (midlo, midhi int) {
+	m := int(uint(lo+hi) >> 1) // Written like this to avoid integer overflow.
+	if hi-lo > 40 {
+		// Tukey's ``Ninther,'' median of three medians of three.
+		s := (hi - lo) / 8
+		medianOfThreeSortByFreq(data, lo, lo+s, lo+2*s)
+		medianOfThreeSortByFreq(data, m, m-s, m+s)
+		medianOfThreeSortByFreq(data, hi-1, hi-1-s, hi-1-2*s)
+	}
+	medianOfThreeSortByFreq(data, lo, m, hi-1)
+
+	// Invariants are:
+	//	data[lo] = pivot (set up by ChoosePivot)
+	//	data[lo < i < a] < pivot
+	//	data[a <= i < b] <= pivot
+	//	data[b <= i < c] unexamined
+	//	data[c <= i < hi-1] > pivot
+	//	data[hi-1] >= pivot
+	pivot := lo
+	a, c := lo+1, hi-1
+
+	for ; a < c && (data[a].freq == data[pivot].freq && data[a].literal < data[pivot].literal || data[a].freq < data[pivot].freq); a++ {
+	}
+	b := a
+	for {
+		for ; b < c && (data[pivot].freq == data[b].freq && data[pivot].literal > data[b].literal || data[pivot].freq > data[b].freq); b++ { // data[b] <= pivot
+		}
+		for ; b < c && (data[pivot].freq == data[c-1].freq && data[pivot].literal < data[c-1].literal || data[pivot].freq < data[c-1].freq); c-- { // data[c-1] > pivot
+		}
+		if b >= c {
+			break
+		}
+		// data[b] > pivot; data[c-1] <= pivot
+		data[b], data[c-1] = data[c-1], data[b]
+		b++
+		c--
+	}
+	// If hi-c<3 then there are duplicates (by property of median of nine).
+	// Let's be a bit more conservative, and set border to 5.
+	protect := hi-c < 5
+	if !protect && hi-c < (hi-lo)/4 {
+		// Lets test some points for equality to pivot
+		dups := 0
+		if data[pivot].freq == data[hi-1].freq && data[pivot].literal > data[hi-1].literal || data[pivot].freq > data[hi-1].freq { // data[hi-1] = pivot
+			data[c], data[hi-1] = data[hi-1], data[c]
+			c++
+			dups++
+		}
+		if data[b-1].freq == data[pivot].freq && data[b-1].literal > data[pivot].literal || data[b-1].freq > data[pivot].freq { // data[b-1] = pivot
+			b--
+			dups++
+		}
+		// m-lo = (hi-lo)/2 > 6
+		// b-lo > (hi-lo)*3/4-1 > 8
+		// ==> m < b ==> data[m] <= pivot
+		if data[m].freq == data[pivot].freq && data[m].literal > data[pivot].literal || data[m].freq > data[pivot].freq { // data[m] = pivot
+			data[m], data[b-1] = data[b-1], data[m]
+			b--
+			dups++
+		}
+		// if at least 2 points are equal to pivot, assume skewed distribution
+		protect = dups > 1
+	}
+	if protect {
+		// Protect against a lot of duplicates
+		// Add invariant:
+		//	data[a <= i < b] unexamined
+		//	data[b <= i < c] = pivot
+		for {
+			for ; a < b && (data[b-1].freq == data[pivot].freq && data[b-1].literal > data[pivot].literal || data[b-1].freq > data[pivot].freq); b-- { // data[b] == pivot
+			}
+			for ; a < b && (data[a].freq == data[pivot].freq && data[a].literal < data[pivot].literal || data[a].freq < data[pivot].freq); a++ { // data[a] < pivot
+			}
+			if a >= b {
+				break
+			}
+			// data[a] == pivot; data[b-1] < pivot
+			data[a], data[b-1] = data[b-1], data[a]
+			a++
+			b--
+		}
+	}
+	// Swap pivot into middle
+	data[pivot], data[b-1] = data[b-1], data[pivot]
+	return b - 1, c
+}
+
+// Insertion sort
+func insertionSortByFreq(data []literalNode, a, b int) {
+	for i := a + 1; i < b; i++ {
+		for j := i; j > a && (data[j].freq == data[j-1].freq && data[j].literal < data[j-1].literal || data[j].freq < data[j-1].freq); j-- {
+			data[j], data[j-1] = data[j-1], data[j]
+		}
+	}
+}
+
+// quickSortByFreq, loosely following Bentley and McIlroy,
+// ``Engineering a Sort Function,'' SP&E November 1993.
+
+// medianOfThreeSortByFreq moves the median of the three values data[m0], data[m1], data[m2] into data[m1].
+func medianOfThreeSortByFreq(data []literalNode, m1, m0, m2 int) {
+	// sort 3 elements
+	if data[m1].freq == data[m0].freq && data[m1].literal < data[m0].literal || data[m1].freq < data[m0].freq {
+		data[m1], data[m0] = data[m0], data[m1]
+	}
+	// data[m0] <= data[m1]
+	if data[m2].freq == data[m1].freq && data[m2].literal < data[m1].literal || data[m2].freq < data[m1].freq {
+		data[m2], data[m1] = data[m1], data[m2]
+		// data[m0] <= data[m2] && data[m1] < data[m2]
+		if data[m1].freq == data[m0].freq && data[m1].literal < data[m0].literal || data[m1].freq < data[m0].freq {
+			data[m1], data[m0] = data[m0], data[m1]
+		}
+	}
+	// now data[m0] <= data[m1] <= data[m2]
+}
diff --git a/src/compress/flate/huffman_sortByLiteral.go b/src/compress/flate/huffman_sortByLiteral.go
new file mode 100644
index 0000000..93f1aea
--- /dev/null
+++ b/src/compress/flate/huffman_sortByLiteral.go
@@ -0,0 +1,201 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+// Sort sorts data.
+// It makes one call to data.Len to determine n, and O(n*log(n)) calls to
+// data.Less and data.Swap. The sort is not guaranteed to be stable.
+func sortByLiteral(data []literalNode) {
+	n := len(data)
+	quickSort(data, 0, n, maxDepth(n))
+}
+
+func quickSort(data []literalNode, a, b, maxDepth int) {
+	for b-a > 12 { // Use ShellSort for slices <= 12 elements
+		if maxDepth == 0 {
+			heapSort(data, a, b)
+			return
+		}
+		maxDepth--
+		mlo, mhi := doPivot(data, a, b)
+		// Avoiding recursion on the larger subproblem guarantees
+		// a stack depth of at most lg(b-a).
+		if mlo-a < b-mhi {
+			quickSort(data, a, mlo, maxDepth)
+			a = mhi // i.e., quickSort(data, mhi, b)
+		} else {
+			quickSort(data, mhi, b, maxDepth)
+			b = mlo // i.e., quickSort(data, a, mlo)
+		}
+	}
+	if b-a > 1 {
+		// Do ShellSort pass with gap 6
+		// It could be written in this simplified form cause b-a <= 12
+		for i := a + 6; i < b; i++ {
+			if data[i].literal < data[i-6].literal {
+				data[i], data[i-6] = data[i-6], data[i]
+			}
+		}
+		insertionSort(data, a, b)
+	}
+}
+func heapSort(data []literalNode, a, b int) {
+	first := a
+	lo := 0
+	hi := b - a
+
+	// Build heap with greatest element at top.
+	for i := (hi - 1) / 2; i >= 0; i-- {
+		siftDown(data, i, hi, first)
+	}
+
+	// Pop elements, largest first, into end of data.
+	for i := hi - 1; i >= 0; i-- {
+		data[first], data[first+i] = data[first+i], data[first]
+		siftDown(data, lo, i, first)
+	}
+}
+
+// siftDown implements the heap property on data[lo, hi).
+// first is an offset into the array where the root of the heap lies.
+func siftDown(data []literalNode, lo, hi, first int) {
+	root := lo
+	for {
+		child := 2*root + 1
+		if child >= hi {
+			break
+		}
+		if child+1 < hi && data[first+child].literal < data[first+child+1].literal {
+			child++
+		}
+		if data[first+root].literal > data[first+child].literal {
+			return
+		}
+		data[first+root], data[first+child] = data[first+child], data[first+root]
+		root = child
+	}
+}
+func doPivot(data []literalNode, lo, hi int) (midlo, midhi int) {
+	m := int(uint(lo+hi) >> 1) // Written like this to avoid integer overflow.
+	if hi-lo > 40 {
+		// Tukey's ``Ninther,'' median of three medians of three.
+		s := (hi - lo) / 8
+		medianOfThree(data, lo, lo+s, lo+2*s)
+		medianOfThree(data, m, m-s, m+s)
+		medianOfThree(data, hi-1, hi-1-s, hi-1-2*s)
+	}
+	medianOfThree(data, lo, m, hi-1)
+
+	// Invariants are:
+	//	data[lo] = pivot (set up by ChoosePivot)
+	//	data[lo < i < a] < pivot
+	//	data[a <= i < b] <= pivot
+	//	data[b <= i < c] unexamined
+	//	data[c <= i < hi-1] > pivot
+	//	data[hi-1] >= pivot
+	pivot := lo
+	a, c := lo+1, hi-1
+
+	for ; a < c && data[a].literal < data[pivot].literal; a++ {
+	}
+	b := a
+	for {
+		for ; b < c && data[pivot].literal > data[b].literal; b++ { // data[b] <= pivot
+		}
+		for ; b < c && data[pivot].literal < data[c-1].literal; c-- { // data[c-1] > pivot
+		}
+		if b >= c {
+			break
+		}
+		// data[b] > pivot; data[c-1] <= pivot
+		data[b], data[c-1] = data[c-1], data[b]
+		b++
+		c--
+	}
+	// If hi-c<3 then there are duplicates (by property of median of nine).
+	// Let's be a bit more conservative, and set border to 5.
+	protect := hi-c < 5
+	if !protect && hi-c < (hi-lo)/4 {
+		// Lets test some points for equality to pivot
+		dups := 0
+		if data[pivot].literal > data[hi-1].literal { // data[hi-1] = pivot
+			data[c], data[hi-1] = data[hi-1], data[c]
+			c++
+			dups++
+		}
+		if data[b-1].literal > data[pivot].literal { // data[b-1] = pivot
+			b--
+			dups++
+		}
+		// m-lo = (hi-lo)/2 > 6
+		// b-lo > (hi-lo)*3/4-1 > 8
+		// ==> m < b ==> data[m] <= pivot
+		if data[m].literal > data[pivot].literal { // data[m] = pivot
+			data[m], data[b-1] = data[b-1], data[m]
+			b--
+			dups++
+		}
+		// if at least 2 points are equal to pivot, assume skewed distribution
+		protect = dups > 1
+	}
+	if protect {
+		// Protect against a lot of duplicates
+		// Add invariant:
+		//	data[a <= i < b] unexamined
+		//	data[b <= i < c] = pivot
+		for {
+			for ; a < b && data[b-1].literal > data[pivot].literal; b-- { // data[b] == pivot
+			}
+			for ; a < b && data[a].literal < data[pivot].literal; a++ { // data[a] < pivot
+			}
+			if a >= b {
+				break
+			}
+			// data[a] == pivot; data[b-1] < pivot
+			data[a], data[b-1] = data[b-1], data[a]
+			a++
+			b--
+		}
+	}
+	// Swap pivot into middle
+	data[pivot], data[b-1] = data[b-1], data[pivot]
+	return b - 1, c
+}
+
+// Insertion sort
+func insertionSort(data []literalNode, a, b int) {
+	for i := a + 1; i < b; i++ {
+		for j := i; j > a && data[j].literal < data[j-1].literal; j-- {
+			data[j], data[j-1] = data[j-1], data[j]
+		}
+	}
+}
+
+// maxDepth returns a threshold at which quicksort should switch
+// to heapsort. It returns 2*ceil(lg(n+1)).
+func maxDepth(n int) int {
+	var depth int
+	for i := n; i > 0; i >>= 1 {
+		depth++
+	}
+	return depth * 2
+}
+
+// medianOfThree moves the median of the three values data[m0], data[m1], data[m2] into data[m1].
+func medianOfThree(data []literalNode, m1, m0, m2 int) {
+	// sort 3 elements
+	if data[m1].literal < data[m0].literal {
+		data[m1], data[m0] = data[m0], data[m1]
+	}
+	// data[m0] <= data[m1]
+	if data[m2].literal < data[m1].literal {
+		data[m2], data[m1] = data[m1], data[m2]
+		// data[m0] <= data[m2] && data[m1] < data[m2]
+		if data[m1].literal < data[m0].literal {
+			data[m1], data[m0] = data[m0], data[m1]
+		}
+	}
+	// now data[m0] <= data[m1] <= data[m2]
+}
diff --git a/src/compress/flate/level1.go b/src/compress/flate/level1.go
new file mode 100644
index 0000000..2195df4
--- /dev/null
+++ b/src/compress/flate/level1.go
@@ -0,0 +1,197 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+// Level 1 uses a single small table with 5 byte hashes.
+type fastEncL1 struct {
+	fastGen
+	table [tableSize]tableEntry
+}
+
+func (e *fastEncL1) Encode(dst *tokens, src []byte) {
+	const (
+		inputMargin            = 12 - 1
+		minNonLiteralBlockSize = 1 + 1 + inputMargin
+		hashBytes              = 5
+	)
+
+	// Protect against e.cur wraparound.
+	for e.cur >= bufferReset {
+		if len(e.hist) == 0 {
+			for i := range e.table[:] {
+				e.table[i] = tableEntry{}
+			}
+			e.cur = maxMatchOffset
+			break
+		}
+		// Shift down everything in the table that isn't already too far away.
+		minOff := e.cur + int32(len(e.hist)) - maxMatchOffset
+		for i := range e.table[:] {
+			v := e.table[i].offset
+			if v <= minOff {
+				v = 0
+			} else {
+				v = v - e.cur + maxMatchOffset
+			}
+			e.table[i].offset = v
+		}
+		e.cur = maxMatchOffset
+	}
+
+	s := e.addBlock(src)
+
+	if len(src) < minNonLiteralBlockSize {
+		// We do not fill the token table.
+		// This will be picked up by caller.
+		dst.n = uint16(len(src))
+		return
+	}
+
+	// Override src
+	src = e.hist
+
+	// nextEmit is where in src the next emitLiterals should start from.
+	nextEmit := s
+
+	// sLimit is when to stop looking for offset/length copies. The inputMargin
+	// lets us use a fast path for emitLiterals in the main loop, while we are
+	// looking for copies.
+	sLimit := int32(len(src) - inputMargin)
+
+	cv := loadLE64(src, s)
+
+	for {
+		const skipLog = 5
+		const doEvery = 2
+
+		nextS := s
+		var candidate tableEntry
+		var t int32
+		for {
+			nextHash := hashLen(cv, tableBits, hashBytes)
+			candidate = e.table[nextHash]
+			nextS = s + doEvery + (s-nextEmit)>>skipLog
+			if nextS > sLimit {
+				goto emitRemainder
+			}
+
+			now := loadLE64(src, nextS)
+			e.table[nextHash] = tableEntry{offset: s + e.cur}
+			nextHash = hashLen(now, tableBits, hashBytes)
+			t = candidate.offset - e.cur
+			if s-t < maxMatchOffset && uint32(cv) == loadLE32(src, t) {
+				e.table[nextHash] = tableEntry{offset: nextS + e.cur}
+				break
+			}
+
+			// Do one right away...
+			cv = now
+			s = nextS
+			nextS++
+			candidate = e.table[nextHash]
+			now >>= 8
+			e.table[nextHash] = tableEntry{offset: s + e.cur}
+
+			t = candidate.offset - e.cur
+			if s-t < maxMatchOffset && uint32(cv) == loadLE32(src, t) {
+				e.table[nextHash] = tableEntry{offset: nextS + e.cur}
+				break
+			}
+			cv = now
+			s = nextS
+		}
+
+		// A 4-byte match has been found. We'll later see if more than 4 bytes
+		// match. But, prior to the match, src[nextEmit:s] are unmatched. Emit
+		// them as literal bytes.
+		for {
+			// Invariant: we have a 4-byte match at s, and no need to emit any
+			// literal bytes prior to s.
+
+			// Extend the 4-byte match as long as possible.
+			l := e.matchlenLong(int(s+4), int(t+4), src) + 4
+
+			// Extend backwards
+			for t > 0 && s > nextEmit && loadLE8(src, t-1) == loadLE8(src, s-1) {
+				s--
+				t--
+				l++
+			}
+			if nextEmit < s {
+				for _, v := range src[nextEmit:s] {
+					dst.tokens[dst.n] = token(v)
+					dst.litHist[v]++
+					dst.n++
+				}
+			}
+
+			// Save the match found. Same as 'dst.AddMatchLong(l, uint32(s-t-baseMatchOffset))'
+			xOffset := uint32(s - t - baseMatchOffset)
+			xLength := l
+			oc := offsetCode(xOffset)
+			xOffset |= oc << 16
+			for xLength > 0 {
+				xl := xLength
+				if xl > 258 {
+					if xl > 258+baseMatchLength {
+						xl = 258
+					} else {
+						xl = 258 - baseMatchLength
+					}
+				}
+				xLength -= xl
+				xl -= baseMatchLength
+				dst.extraHist[lengthCodes1[uint8(xl)]]++
+				dst.offHist[oc]++
+				dst.tokens[dst.n] = token(matchType | uint32(xl)<<lengthShift | xOffset)
+				dst.n++
+			}
+			s += l
+			nextEmit = s
+			if nextS >= s {
+				s = nextS + 1
+			}
+			if s >= sLimit {
+				// Index first pair after match end.
+				if int(s+l+8) < len(src) {
+					cv := loadLE64(src, s)
+					e.table[hashLen(cv, tableBits, hashBytes)] = tableEntry{offset: s + e.cur}
+				}
+				goto emitRemainder
+			}
+
+			// We could immediately start working at s now, but to improve
+			// compression we first update the hash table at s-2 and at s. If
+			// another emitCopy is not our next move, also calculate nextHash
+			// at s+1. At least on GOARCH=amd64, these three hash calculations
+			// are faster as one load64 call (with some shifts) instead of
+			// three load32 calls.
+			x := loadLE64(src, s-2)
+			o := e.cur + s - 2
+			prevHash := hashLen(x, tableBits, hashBytes)
+			e.table[prevHash] = tableEntry{offset: o}
+			x >>= 16
+			currHash := hashLen(x, tableBits, hashBytes)
+			candidate = e.table[currHash]
+			e.table[currHash] = tableEntry{offset: o + 2}
+
+			t = candidate.offset - e.cur
+			if s-t > maxMatchOffset || uint32(x) != loadLE32(src, t) {
+				cv = x >> 8
+				s++
+				break
+			}
+		}
+	}
+
+emitRemainder:
+	if int(nextEmit) < len(src) {
+		// If nothing was added, don't encode literals.
+		if dst.n == 0 {
+			return
+		}
+		emitLiterals(dst, src[nextEmit:])
+	}
+}
diff --git a/src/compress/flate/level2.go b/src/compress/flate/level2.go
new file mode 100644
index 0000000..7a2fdf7
--- /dev/null
+++ b/src/compress/flate/level2.go
@@ -0,0 +1,187 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+// Level 2 uses a similar algorithm to level 1, but with a larger table.
+type fastEncL2 struct {
+	fastGen
+	table [bTableSize]tableEntry
+}
+
+func (e *fastEncL2) Encode(dst *tokens, src []byte) {
+	const (
+		inputMargin            = 12 - 1
+		minNonLiteralBlockSize = 1 + 1 + inputMargin
+		hashBytes              = 5
+	)
+
+	// Protect against e.cur wraparound.
+	for e.cur >= bufferReset {
+		if len(e.hist) == 0 {
+			for i := range e.table[:] {
+				e.table[i] = tableEntry{}
+			}
+			e.cur = maxMatchOffset
+			break
+		}
+		// Shift down everything in the table that isn't already too far away.
+		minOff := e.cur + int32(len(e.hist)) - maxMatchOffset
+		for i := range e.table[:] {
+			v := e.table[i].offset
+			if v <= minOff {
+				v = 0
+			} else {
+				v = v - e.cur + maxMatchOffset
+			}
+			e.table[i].offset = v
+		}
+		e.cur = maxMatchOffset
+	}
+
+	s := e.addBlock(src)
+
+	if len(src) < minNonLiteralBlockSize {
+		// We do not fill the token table.
+		// This will be picked up by caller.
+		dst.n = uint16(len(src))
+		return
+	}
+
+	// Override src
+	src = e.hist
+
+	// nextEmit is where in src the next emitLiterals should start from.
+	nextEmit := s
+
+	// sLimit is when to stop looking for offset/length copies. The inputMargin
+	// lets us use a fast path for emitLiterals in the main loop, while we are
+	// looking for copies.
+	sLimit := int32(len(src) - inputMargin)
+
+	cv := loadLE64(src, s)
+	for {
+		// When should we start skipping if we haven't found matches in a long while.
+		const skipLog = 5
+		const doEvery = 2
+
+		nextS := s
+		var candidate tableEntry
+		for {
+			nextHash := hashLen(cv, bTableBits, hashBytes)
+			s = nextS
+			nextS = s + doEvery + (s-nextEmit)>>skipLog
+			if nextS > sLimit {
+				goto emitRemainder
+			}
+			candidate = e.table[nextHash]
+			now := loadLE64(src, nextS)
+			e.table[nextHash] = tableEntry{offset: s + e.cur}
+			nextHash = hashLen(now, bTableBits, hashBytes)
+
+			offset := s - (candidate.offset - e.cur)
+			if offset < maxMatchOffset && uint32(cv) == loadLE32(src, candidate.offset-e.cur) {
+				e.table[nextHash] = tableEntry{offset: nextS + e.cur}
+				break
+			}
+
+			// Do one right away...
+			cv = now
+			s = nextS
+			nextS++
+			candidate = e.table[nextHash]
+			now >>= 8
+			e.table[nextHash] = tableEntry{offset: s + e.cur}
+
+			offset = s - (candidate.offset - e.cur)
+			if offset < maxMatchOffset && uint32(cv) == loadLE32(src, candidate.offset-e.cur) {
+				break
+			}
+			cv = now
+		}
+
+		// A 4-byte match has been found. We'll later see if more than 4 bytes match.
+		for {
+			// Extend the 4-byte match as long as possible.
+			t := candidate.offset - e.cur
+			l := e.matchlenLong(int(s+4), int(t+4), src) + 4
+
+			// Extend backwards
+			for t > 0 && s > nextEmit && src[t-1] == src[s-1] {
+				s--
+				t--
+				l++
+			}
+			if nextEmit < s {
+				for _, v := range src[nextEmit:s] {
+					dst.tokens[dst.n] = token(v)
+					dst.litHist[v]++
+					dst.n++
+				}
+			}
+
+			dst.AddMatchLong(l, uint32(s-t-baseMatchOffset))
+			s += l
+			nextEmit = s
+			if nextS >= s {
+				s = nextS + 1
+			}
+
+			if s >= sLimit {
+				// Index first pair after match end.
+				if int(s+l+8) < len(src) {
+					cv := loadLE64(src, s)
+					e.table[hashLen(cv, bTableBits, hashBytes)] = tableEntry{offset: s + e.cur}
+				}
+				goto emitRemainder
+			}
+
+			// Store every second hash in-between, but offset by 1.
+			for i := s - l + 2; i < s-5; i += 7 {
+				x := loadLE64(src, i)
+				nextHash := hashLen(x, bTableBits, hashBytes)
+				e.table[nextHash] = tableEntry{offset: e.cur + i}
+				// Skip one
+				x >>= 16
+				nextHash = hashLen(x, bTableBits, hashBytes)
+				e.table[nextHash] = tableEntry{offset: e.cur + i + 2}
+				// Skip one
+				x >>= 16
+				nextHash = hashLen(x, bTableBits, hashBytes)
+				e.table[nextHash] = tableEntry{offset: e.cur + i + 4}
+			}
+
+			// We could immediately start working at s now, but to improve
+			// compression we first update the hash table at s-2 to s. If
+			// another emitCopy is not our next move, also calculate nextHash
+			// at s+1.
+			x := loadLE64(src, s-2)
+			o := e.cur + s - 2
+			prevHash := hashLen(x, bTableBits, hashBytes)
+			prevHash2 := hashLen(x>>8, bTableBits, hashBytes)
+			e.table[prevHash] = tableEntry{offset: o}
+			e.table[prevHash2] = tableEntry{offset: o + 1}
+			currHash := hashLen(x>>16, bTableBits, hashBytes)
+			candidate = e.table[currHash]
+			e.table[currHash] = tableEntry{offset: o + 2}
+
+			offset := s - (candidate.offset - e.cur)
+			if offset > maxMatchOffset || uint32(x>>16) != loadLE32(src, candidate.offset-e.cur) {
+				cv = x >> 24
+				s++
+				break
+			}
+		}
+	}
+
+emitRemainder:
+	if int(nextEmit) < len(src) {
+		// If nothing was added, don't encode literals.
+		if dst.n == 0 {
+			return
+		}
+
+		emitLiterals(dst, src[nextEmit:])
+	}
+}
diff --git a/src/compress/flate/level3.go b/src/compress/flate/level3.go
new file mode 100644
index 0000000..adda871
--- /dev/null
+++ b/src/compress/flate/level3.go
@@ -0,0 +1,226 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+// Level 3 uses a similar algorithm to level 2, with a smaller table,
+// but will check up two candidates for each iteration with more
+// entries added to the table.
+type fastEncL3 struct {
+	fastGen
+	table [1 << 16]tableEntryPrev
+}
+
+func (e *fastEncL3) Encode(dst *tokens, src []byte) {
+	const (
+		inputMargin            = 12 - 1
+		minNonLiteralBlockSize = 1 + 1 + inputMargin
+		tableBits              = 16
+		hashBytes              = 5
+	)
+
+	// Protect against e.cur wraparound.
+	for e.cur >= bufferReset {
+		if len(e.hist) == 0 {
+			for i := range e.table[:] {
+				e.table[i] = tableEntryPrev{}
+			}
+			e.cur = maxMatchOffset
+			break
+		}
+		// Shift down everything in the table that isn't already too far away.
+		minOff := e.cur + int32(len(e.hist)) - maxMatchOffset
+		for i := range e.table[:] {
+			v := e.table[i]
+			if v.Cur.offset <= minOff {
+				v.Cur.offset = 0
+			} else {
+				v.Cur.offset = v.Cur.offset - e.cur + maxMatchOffset
+			}
+			if v.Prev.offset <= minOff {
+				v.Prev.offset = 0
+			} else {
+				v.Prev.offset = v.Prev.offset - e.cur + maxMatchOffset
+			}
+			e.table[i] = v
+		}
+		e.cur = maxMatchOffset
+	}
+
+	s := e.addBlock(src)
+
+	// Skip if too small.
+	if len(src) < minNonLiteralBlockSize {
+		// We do not fill the token table.
+		// This will be picked up by caller.
+		dst.n = uint16(len(src))
+		return
+	}
+
+	// Override src
+	src = e.hist
+	nextEmit := s
+
+	// sLimit is when to stop looking for offset/length copies. The inputMargin
+	// lets us use a fast path for emitLiterals in the main loop, while we are
+	// looking for copies.
+	sLimit := int32(len(src) - inputMargin)
+
+	// nextEmit is where in src the next emitLiterals should start from.
+	cv := loadLE64(src, s)
+	for {
+		const skipLog = 7
+		nextS := s
+		var candidate tableEntry
+		for {
+			nextHash := hashLen(cv, tableBits, hashBytes)
+			s = nextS
+			nextS = s + 1 + (s-nextEmit)>>skipLog
+			if nextS > sLimit {
+				goto emitRemainder
+			}
+			candidates := e.table[nextHash]
+			now := loadLE64(src, nextS)
+
+			// Safe offset distance until s + 4...
+			minOffset := e.cur + s - (maxMatchOffset - 4)
+			e.table[nextHash] = tableEntryPrev{Prev: candidates.Cur, Cur: tableEntry{offset: s + e.cur}}
+
+			// Check both candidates
+			candidate = candidates.Cur
+			if candidate.offset < minOffset {
+				cv = now
+				// Previous will also be invalid, we have nothing.
+				continue
+			}
+
+			if uint32(cv) == loadLE32(src, candidate.offset-e.cur) {
+				if candidates.Prev.offset < minOffset || uint32(cv) != loadLE32(src, candidates.Prev.offset-e.cur) {
+					break
+				}
+				// Both match and are valid, pick longest.
+				offset := s - (candidate.offset - e.cur)
+				o2 := s - (candidates.Prev.offset - e.cur)
+				l1, l2 := matchLen(src[s+4:], src[s-offset+4:]), matchLen(src[s+4:], src[s-o2+4:])
+				if l2 > l1 {
+					candidate = candidates.Prev
+				}
+				break
+			} else {
+				// We only check if value mismatches.
+				// Offset will always be invalid in other cases.
+				candidate = candidates.Prev
+				if candidate.offset > minOffset && uint32(cv) == loadLE32(src, candidate.offset-e.cur) {
+					break
+				}
+			}
+			cv = now
+		}
+
+		for {
+			// Extend the 4-byte match as long as possible.
+			//
+			t := candidate.offset - e.cur
+			l := e.matchlenLong(int(s+4), int(t+4), src) + 4
+
+			// Extend backwards
+			for t > 0 && s > nextEmit && src[t-1] == src[s-1] {
+				s--
+				t--
+				l++
+			}
+			// Emit literals.
+			if nextEmit < s {
+				for _, v := range src[nextEmit:s] {
+					dst.tokens[dst.n] = token(v)
+					dst.litHist[v]++
+					dst.n++
+				}
+			}
+
+			// Emit match.
+			dst.AddMatchLong(l, uint32(s-t-baseMatchOffset))
+			s += l
+			nextEmit = s
+			if nextS >= s {
+				s = nextS + 1
+			}
+
+			if s >= sLimit {
+				t += l
+				// Index first pair after match end.
+				if int(t+8) < len(src) && t > 0 {
+					cv = loadLE64(src, t)
+					nextHash := hashLen(cv, tableBits, hashBytes)
+					e.table[nextHash] = tableEntryPrev{
+						Prev: e.table[nextHash].Cur,
+						Cur:  tableEntry{offset: e.cur + t},
+					}
+				}
+				goto emitRemainder
+			}
+
+			// Store every 5th hash in-between.
+			for i := s - l + 2; i < s-5; i += 6 {
+				nextHash := hashLen(loadLE64(src, i), tableBits, hashBytes)
+				e.table[nextHash] = tableEntryPrev{
+					Prev: e.table[nextHash].Cur,
+					Cur:  tableEntry{offset: e.cur + i}}
+			}
+			// We could immediately start working at s now, but to improve
+			// compression we first update the hash table at s-2 to s.
+			x := loadLE64(src, s-2)
+			prevHash := hashLen(x, tableBits, hashBytes)
+
+			e.table[prevHash] = tableEntryPrev{
+				Prev: e.table[prevHash].Cur,
+				Cur:  tableEntry{offset: e.cur + s - 2},
+			}
+			x >>= 8
+			prevHash = hashLen(x, tableBits, hashBytes)
+
+			e.table[prevHash] = tableEntryPrev{
+				Prev: e.table[prevHash].Cur,
+				Cur:  tableEntry{offset: e.cur + s - 1},
+			}
+			x >>= 8
+			currHash := hashLen(x, tableBits, hashBytes)
+			candidates := e.table[currHash]
+			cv = x
+			e.table[currHash] = tableEntryPrev{
+				Prev: candidates.Cur,
+				Cur:  tableEntry{offset: s + e.cur},
+			}
+
+			// Check both candidates
+			candidate = candidates.Cur
+			minOffset := e.cur + s - (maxMatchOffset - 4)
+
+			if candidate.offset > minOffset {
+				if uint32(cv) == loadLE32(src, candidate.offset-e.cur) {
+					// Found a match...
+					continue
+				}
+				candidate = candidates.Prev
+				if candidate.offset > minOffset && uint32(cv) == loadLE32(src, candidate.offset-e.cur) {
+					// Match at prev...
+					continue
+				}
+			}
+			cv = x >> 8
+			s++
+			break
+		}
+	}
+
+emitRemainder:
+	if int(nextEmit) < len(src) {
+		// If nothing was added, don't encode literals.
+		if dst.n == 0 {
+			return
+		}
+
+		emitLiterals(dst, src[nextEmit:])
+	}
+}
diff --git a/src/compress/flate/level4.go b/src/compress/flate/level4.go
new file mode 100644
index 0000000..f62168b
--- /dev/null
+++ b/src/compress/flate/level4.go
@@ -0,0 +1,204 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+// Level 4 uses two tables, one for short (4 bytes) and one for long (7 bytes) matches.
+type fastEncL4 struct {
+	fastGen
+	table  [tableSize]tableEntry
+	bTable [tableSize]tableEntry
+}
+
+func (e *fastEncL4) Encode(dst *tokens, src []byte) {
+	const (
+		inputMargin            = 12 - 1
+		minNonLiteralBlockSize = 1 + 1 + inputMargin
+		hashShortBytes         = 4
+	)
+	// Protect against e.cur wraparound.
+	for e.cur >= bufferReset {
+		if len(e.hist) == 0 {
+			for i := range e.table[:] {
+				e.table[i] = tableEntry{}
+			}
+			for i := range e.bTable[:] {
+				e.bTable[i] = tableEntry{}
+			}
+			e.cur = maxMatchOffset
+			break
+		}
+		// Shift down everything in the table that isn't already too far away.
+		minOff := e.cur + int32(len(e.hist)) - maxMatchOffset
+		for i := range e.table[:] {
+			v := e.table[i].offset
+			if v <= minOff {
+				v = 0
+			} else {
+				v = v - e.cur + maxMatchOffset
+			}
+			e.table[i].offset = v
+		}
+		for i := range e.bTable[:] {
+			v := e.bTable[i].offset
+			if v <= minOff {
+				v = 0
+			} else {
+				v = v - e.cur + maxMatchOffset
+			}
+			e.bTable[i].offset = v
+		}
+		e.cur = maxMatchOffset
+	}
+
+	s := e.addBlock(src)
+
+	// This check isn't in the Snappy implementation, but there, the caller
+	// instead of the callee handles this case.
+	if len(src) < minNonLiteralBlockSize {
+		// We do not fill the token table.
+		// This will be picked up by caller.
+		dst.n = uint16(len(src))
+		return
+	}
+
+	// Override src
+	src = e.hist
+	nextEmit := s
+
+	// sLimit is when to stop looking for offset/length copies. The inputMargin
+	// lets us use a fast path for emitLiterals in the main loop, while we are
+	// looking for copies.
+	sLimit := int32(len(src) - inputMargin)
+
+	// nextEmit is where in src the next emitLiterals should start from.
+	cv := loadLE64(src, s)
+	for {
+		const skipLog = 6
+		const doEvery = 1
+
+		nextS := s
+		var t int32
+		for {
+			nextHashS := hashLen(cv, tableBits, hashShortBytes)
+			nextHashL := hash7(cv, tableBits)
+
+			s = nextS
+			nextS = s + doEvery + (s-nextEmit)>>skipLog
+			if nextS > sLimit {
+				goto emitRemainder
+			}
+			// Fetch a short+long candidate
+			sCandidate := e.table[nextHashS]
+			lCandidate := e.bTable[nextHashL]
+			next := loadLE64(src, nextS)
+			entry := tableEntry{offset: s + e.cur}
+			e.table[nextHashS] = entry
+			e.bTable[nextHashL] = entry
+
+			t = lCandidate.offset - e.cur
+			if s-t < maxMatchOffset && uint32(cv) == loadLE32(src, t) {
+				// We got a long match. Use that.
+				break
+			}
+
+			t = sCandidate.offset - e.cur
+			if s-t < maxMatchOffset && uint32(cv) == loadLE32(src, t) {
+				// Found a 4 match...
+				lCandidate = e.bTable[hash7(next, tableBits)]
+
+				// If the next long is a candidate, check if we should use that instead...
+				lOff := lCandidate.offset - e.cur
+				if nextS-lOff < maxMatchOffset && loadLE32(src, lOff) == uint32(next) {
+					l1, l2 := matchLen(src[s+4:], src[t+4:]), matchLen(src[nextS+4:], src[nextS-lOff+4:])
+					if l2 > l1 {
+						s = nextS
+						t = lCandidate.offset - e.cur
+					}
+				}
+				break
+			}
+			cv = next
+		}
+
+		// A 4-byte match has been found. We'll later see if more than 4 bytes
+		// match. But, prior to the match, src[nextEmit:s] are unmatched. Emit
+		// them as literal bytes.
+
+		// Extend the 4-byte match as long as possible.
+		l := e.matchlenLong(int(s+4), int(t+4), src) + 4
+
+		// Extend backwards
+		for t > 0 && s > nextEmit && src[t-1] == src[s-1] {
+			s--
+			t--
+			l++
+		}
+		if nextEmit < s {
+			for _, v := range src[nextEmit:s] {
+				dst.tokens[dst.n] = token(v)
+				dst.litHist[v]++
+				dst.n++
+			}
+		}
+
+		dst.AddMatchLong(l, uint32(s-t-baseMatchOffset))
+		s += l
+		nextEmit = s
+		if nextS >= s {
+			s = nextS + 1
+		}
+
+		if s >= sLimit {
+			// Index first pair after match end.
+			if int(s+8) < len(src) {
+				cv := loadLE64(src, s)
+				e.table[hashLen(cv, tableBits, hashShortBytes)] = tableEntry{offset: s + e.cur}
+				e.bTable[hash7(cv, tableBits)] = tableEntry{offset: s + e.cur}
+			}
+			goto emitRemainder
+		}
+
+		// Store every 3rd hash in-between
+		i := nextS
+		if i < s-1 {
+			cv := loadLE64(src, i)
+			t := tableEntry{offset: i + e.cur}
+			t2 := tableEntry{offset: t.offset + 1}
+			e.bTable[hash7(cv, tableBits)] = t
+			e.bTable[hash7(cv>>8, tableBits)] = t2
+			e.table[hashLen(cv>>8, tableBits, hashShortBytes)] = t2
+
+			i += 3
+			for ; i < s-1; i += 3 {
+				cv := loadLE64(src, i)
+				t := tableEntry{offset: i + e.cur}
+				t2 := tableEntry{offset: t.offset + 1}
+				e.bTable[hash7(cv, tableBits)] = t
+				e.bTable[hash7(cv>>8, tableBits)] = t2
+				e.table[hashLen(cv>>8, tableBits, hashShortBytes)] = t2
+			}
+		}
+
+		// We could immediately start working at s now, but to improve
+		// compression we first update the hash table at s-1 and at s.
+		x := loadLE64(src, s-1)
+		o := e.cur + s - 1
+		prevHashS := hashLen(x, tableBits, hashShortBytes)
+		prevHashL := hash7(x, tableBits)
+		e.table[prevHashS] = tableEntry{offset: o}
+		e.bTable[prevHashL] = tableEntry{offset: o}
+		cv = x >> 8
+	}
+
+emitRemainder:
+	if int(nextEmit) < len(src) {
+		// If nothing was added, don't encode literals.
+		if dst.n == 0 {
+			return
+		}
+
+		emitLiterals(dst, src[nextEmit:])
+	}
+}
diff --git a/src/compress/flate/level5.go b/src/compress/flate/level5.go
new file mode 100644
index 0000000..5ef342e
--- /dev/null
+++ b/src/compress/flate/level5.go
@@ -0,0 +1,291 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+// Level 5 is similar to level 4, but for long matches two candidates are tested.
+// Once a match is found, when it stops it will attempt to find a match that extends further.
+type fastEncL5 struct {
+	fastGen
+	table  [tableSize]tableEntry
+	bTable [tableSize]tableEntryPrev
+}
+
+func (e *fastEncL5) Encode(dst *tokens, src []byte) {
+	const (
+		inputMargin            = 12 - 1
+		minNonLiteralBlockSize = 1 + 1 + inputMargin
+		hashShortBytes         = 4
+	)
+
+	// Protect against e.cur wraparound.
+	for e.cur >= bufferReset {
+		if len(e.hist) == 0 {
+			for i := range e.table[:] {
+				e.table[i] = tableEntry{}
+			}
+			for i := range e.bTable[:] {
+				e.bTable[i] = tableEntryPrev{}
+			}
+			e.cur = maxMatchOffset
+			break
+		}
+		// Shift down everything in the table that isn't already too far away.
+		minOff := e.cur + int32(len(e.hist)) - maxMatchOffset
+		for i := range e.table[:] {
+			v := e.table[i].offset
+			if v <= minOff {
+				v = 0
+			} else {
+				v = v - e.cur + maxMatchOffset
+			}
+			e.table[i].offset = v
+		}
+		for i := range e.bTable[:] {
+			v := e.bTable[i]
+			if v.Cur.offset <= minOff {
+				v.Cur.offset = 0
+				v.Prev.offset = 0
+			} else {
+				v.Cur.offset = v.Cur.offset - e.cur + maxMatchOffset
+				if v.Prev.offset <= minOff {
+					v.Prev.offset = 0
+				} else {
+					v.Prev.offset = v.Prev.offset - e.cur + maxMatchOffset
+				}
+			}
+			e.bTable[i] = v
+		}
+		e.cur = maxMatchOffset
+	}
+
+	s := e.addBlock(src)
+
+	// This check isn't in the Snappy implementation, but there, the caller
+	// instead of the callee handles this case.
+	if len(src) < minNonLiteralBlockSize {
+		// We do not fill the token table.
+		// This will be picked up by caller.
+		dst.n = uint16(len(src))
+		return
+	}
+
+	// Override src
+	src = e.hist
+
+	// nextEmit is where in src the next emitLiterals should start from.
+	nextEmit := s
+
+	// sLimit is when to stop looking for offset/length copies. The inputMargin
+	// lets us use a fast path for emitLiterals in the main loop, while we are
+	// looking for copies.
+	sLimit := int32(len(src) - inputMargin)
+
+	cv := loadLE64(src, s)
+	for {
+		const skipLog = 6
+		const doEvery = 1
+
+		nextS := s
+		var l int32
+		var t int32
+		for {
+			nextHashS := hashLen(cv, tableBits, hashShortBytes)
+			nextHashL := hash7(cv, tableBits)
+
+			s = nextS
+			nextS = s + doEvery + (s-nextEmit)>>skipLog
+			if nextS > sLimit {
+				goto emitRemainder
+			}
+			// Fetch a short+long candidate
+			sCandidate := e.table[nextHashS]
+			lCandidate := e.bTable[nextHashL]
+			next := loadLE64(src, nextS)
+			entry := tableEntry{offset: s + e.cur}
+			e.table[nextHashS] = entry
+			eLong := &e.bTable[nextHashL]
+			eLong.Cur, eLong.Prev = entry, eLong.Cur
+
+			nextHashS = hashLen(next, tableBits, hashShortBytes)
+			nextHashL = hash7(next, tableBits)
+
+			t = lCandidate.Cur.offset - e.cur
+			if s-t < maxMatchOffset {
+				if uint32(cv) == loadLE32(src, t) {
+					// Store the next match
+					e.table[nextHashS] = tableEntry{offset: nextS + e.cur}
+					eLong := &e.bTable[nextHashL]
+					eLong.Cur, eLong.Prev = tableEntry{offset: nextS + e.cur}, eLong.Cur
+
+					t2 := lCandidate.Prev.offset - e.cur
+					if s-t2 < maxMatchOffset && uint32(cv) == loadLE32(src, t2) {
+						l = e.matchLenLimited(int(s+4), int(t+4), src) + 4
+						ml1 := e.matchLenLimited(int(s+4), int(t2+4), src) + 4
+						if ml1 > l {
+							t = t2
+							l = ml1
+							break
+						}
+					}
+					break
+				}
+				t = lCandidate.Prev.offset - e.cur
+				if s-t < maxMatchOffset && uint32(cv) == loadLE32(src, t) {
+					// Store the next match
+					e.table[nextHashS] = tableEntry{offset: nextS + e.cur}
+					eLong := &e.bTable[nextHashL]
+					eLong.Cur, eLong.Prev = tableEntry{offset: nextS + e.cur}, eLong.Cur
+					break
+				}
+			}
+
+			t = sCandidate.offset - e.cur
+			if s-t < maxMatchOffset && uint32(cv) == loadLE32(src, t) {
+				// Found a 4 match...
+				l = e.matchLenLimited(int(s+4), int(t+4), src) + 4
+				lCandidate = e.bTable[nextHashL]
+				// Store the next match
+
+				e.table[nextHashS] = tableEntry{offset: nextS + e.cur}
+				eLong := &e.bTable[nextHashL]
+				eLong.Cur, eLong.Prev = tableEntry{offset: nextS + e.cur}, eLong.Cur
+
+				// If the next long is a candidate, use that...
+				t2 := lCandidate.Cur.offset - e.cur
+				if nextS-t2 < maxMatchOffset {
+					if loadLE32(src, t2) == uint32(next) {
+						ml := e.matchLenLimited(int(nextS+4), int(t2+4), src) + 4
+						if ml > l {
+							t = t2
+							s = nextS
+							l = ml
+							break
+						}
+					}
+					// If the previous long is a candidate, use that...
+					t2 = lCandidate.Prev.offset - e.cur
+					if nextS-t2 < maxMatchOffset && loadLE32(src, t2) == uint32(next) {
+						ml := e.matchLenLimited(int(nextS+4), int(t2+4), src) + 4
+						if ml > l {
+							t = t2
+							s = nextS
+							l = ml
+							break
+						}
+					}
+				}
+				break
+			}
+			cv = next
+		}
+
+		if l == 0 {
+			// Extend the 4-byte match as long as possible.
+			l = e.matchlenLong(int(s+4), int(t+4), src) + 4
+		} else if l == maxMatchLength {
+			l += e.matchlenLong(int(s+l), int(t+l), src)
+		}
+
+		// Try to locate a better match by checking the end of best match...
+		if sAt := s + l; l < 30 && sAt < sLimit {
+			// Allow some bytes at the beginning to mismatch.
+			// Sweet spot is 2/3 bytes depending on input.
+			// 3 is only a little better when it is but sometimes a lot worse.
+			// The skipped bytes are tested in Extend backwards,
+			// and still picked up as part of the match if they do.
+			const skipBeginning = 2
+			eLong := e.bTable[hash7(loadLE64(src, sAt), tableBits)].Cur.offset
+			t2 := eLong - e.cur - l + skipBeginning
+			s2 := s + skipBeginning
+			off := s2 - t2
+			if t2 >= 0 && off < maxMatchOffset && off > 0 {
+				if l2 := e.matchlenLong(int(s2), int(t2), src); l2 > l {
+					t = t2
+					l = l2
+					s = s2
+				}
+			}
+		}
+
+		// Extend backwards
+		for t > 0 && s > nextEmit && src[t-1] == src[s-1] {
+			s--
+			t--
+			l++
+		}
+		if nextEmit < s {
+			for _, v := range src[nextEmit:s] {
+				dst.tokens[dst.n] = token(v)
+				dst.litHist[v]++
+				dst.n++
+			}
+		}
+
+		dst.AddMatchLong(l, uint32(s-t-baseMatchOffset))
+		s += l
+		nextEmit = s
+		if nextS >= s {
+			s = nextS + 1
+		}
+
+		if s >= sLimit {
+			goto emitRemainder
+		}
+
+		// Store every 3rd hash in-between.
+		const hashEvery = 3
+		i := s - l + 1
+		if i < s-1 {
+			cv := loadLE64(src, i)
+			t := tableEntry{offset: i + e.cur}
+			e.table[hashLen(cv, tableBits, hashShortBytes)] = t
+			eLong := &e.bTable[hash7(cv, tableBits)]
+			eLong.Cur, eLong.Prev = t, eLong.Cur
+
+			// Do an long at i+1
+			cv >>= 8
+			t = tableEntry{offset: t.offset + 1}
+			eLong = &e.bTable[hash7(cv, tableBits)]
+			eLong.Cur, eLong.Prev = t, eLong.Cur
+
+			// We only have enough bits for a short entry at i+2
+			cv >>= 8
+			t = tableEntry{offset: t.offset + 1}
+			e.table[hashLen(cv, tableBits, hashShortBytes)] = t
+
+			// Skip one - otherwise we risk hitting 's'
+			i += 4
+			for ; i < s-1; i += hashEvery {
+				cv := loadLE64(src, i)
+				t := tableEntry{offset: i + e.cur}
+				t2 := tableEntry{offset: t.offset + 1}
+				eLong := &e.bTable[hash7(cv, tableBits)]
+				eLong.Cur, eLong.Prev = t, eLong.Cur
+				e.table[hashLen(cv>>8, tableBits, hashShortBytes)] = t2
+			}
+		}
+
+		// We could immediately start working at s now, but to improve
+		// compression we first update the hash table at s-1 and at s.
+		x := loadLE64(src, s-1)
+		o := e.cur + s - 1
+		prevHashS := hashLen(x, tableBits, hashShortBytes)
+		prevHashL := hash7(x, tableBits)
+		e.table[prevHashS] = tableEntry{offset: o}
+		eLong := &e.bTable[prevHashL]
+		eLong.Cur, eLong.Prev = tableEntry{offset: o}, eLong.Cur
+		cv = x >> 8
+	}
+
+emitRemainder:
+	if int(nextEmit) < len(src) {
+		// If nothing was added, don't encode literals.
+		if dst.n == 0 {
+			return
+		}
+
+		emitLiterals(dst, src[nextEmit:])
+	}
+}
diff --git a/src/compress/flate/level6.go b/src/compress/flate/level6.go
new file mode 100644
index 0000000..851a715
--- /dev/null
+++ b/src/compress/flate/level6.go
@@ -0,0 +1,301 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+// Level 6 extends level 5, but does "repeat offset" check,
+// as well as adding more hash entries to the tables.
+type fastEncL6 struct {
+	fastGen
+	table  [tableSize]tableEntry
+	bTable [tableSize]tableEntryPrev
+}
+
+func (e *fastEncL6) Encode(dst *tokens, src []byte) {
+	const (
+		inputMargin            = 12 - 1
+		minNonLiteralBlockSize = 1 + 1 + inputMargin
+		hashShortBytes         = 4
+	)
+
+	// Protect against e.cur wraparound.
+	for e.cur >= bufferReset {
+		if len(e.hist) == 0 {
+			for i := range e.table[:] {
+				e.table[i] = tableEntry{}
+			}
+			for i := range e.bTable[:] {
+				e.bTable[i] = tableEntryPrev{}
+			}
+			e.cur = maxMatchOffset
+			break
+		}
+		// Shift down everything in the table that isn't already too far away.
+		minOff := e.cur + int32(len(e.hist)) - maxMatchOffset
+		for i := range e.table[:] {
+			v := e.table[i].offset
+			if v <= minOff {
+				v = 0
+			} else {
+				v = v - e.cur + maxMatchOffset
+			}
+			e.table[i].offset = v
+		}
+		for i := range e.bTable[:] {
+			v := e.bTable[i]
+			if v.Cur.offset <= minOff {
+				v.Cur.offset = 0
+				v.Prev.offset = 0
+			} else {
+				v.Cur.offset = v.Cur.offset - e.cur + maxMatchOffset
+				if v.Prev.offset <= minOff {
+					v.Prev.offset = 0
+				} else {
+					v.Prev.offset = v.Prev.offset - e.cur + maxMatchOffset
+				}
+			}
+			e.bTable[i] = v
+		}
+		e.cur = maxMatchOffset
+	}
+
+	s := e.addBlock(src)
+
+	if len(src) < minNonLiteralBlockSize {
+		// We do not fill the token table.
+		// This will be picked up by caller.
+		dst.n = uint16(len(src))
+		return
+	}
+
+	// Override src
+	src = e.hist
+
+	// nextEmit is where in src the next emitLiterals should start from.
+	nextEmit := s
+
+	// sLimit is when to stop looking for offset/length copies. The inputMargin
+	// lets us use a fast path for emitLiterals in the main loop, while we are
+	// looking for copies.
+	sLimit := int32(len(src) - inputMargin)
+
+	cv := loadLE64(src, s)
+	// Repeat MUST be > 1 and within range
+	repeat := int32(1)
+	for {
+		const skipLog = 7
+		const doEvery = 1
+
+		nextS := s
+		var l int32
+		var t int32
+		for {
+			nextHashS := hashLen(cv, tableBits, hashShortBytes)
+			nextHashL := hash7(cv, tableBits)
+			s = nextS
+			nextS = s + doEvery + (s-nextEmit)>>skipLog
+			if nextS > sLimit {
+				goto emitRemainder
+			}
+			// Fetch a short+long candidate
+			sCandidate := e.table[nextHashS]
+			lCandidate := e.bTable[nextHashL]
+			next := loadLE64(src, nextS)
+			entry := tableEntry{offset: s + e.cur}
+			e.table[nextHashS] = entry
+			eLong := &e.bTable[nextHashL]
+			eLong.Cur, eLong.Prev = entry, eLong.Cur
+
+			// Calculate hashes of 'next'
+			nextHashS = hashLen(next, tableBits, hashShortBytes)
+			nextHashL = hash7(next, tableBits)
+
+			t = lCandidate.Cur.offset - e.cur
+			if s-t < maxMatchOffset {
+				if uint32(cv) == loadLE32(src, t) {
+					// Long candidate matches at least 4 bytes.
+
+					// Store the next match
+					e.table[nextHashS] = tableEntry{offset: nextS + e.cur}
+					eLong := &e.bTable[nextHashL]
+					eLong.Cur, eLong.Prev = tableEntry{offset: nextS + e.cur}, eLong.Cur
+
+					// Check the previous long candidate as well.
+					t2 := lCandidate.Prev.offset - e.cur
+					if s-t2 < maxMatchOffset && uint32(cv) == loadLE32(src, t2) {
+						l = e.matchLenLimited(int(s+4), int(t+4), src) + 4
+						ml1 := e.matchLenLimited(int(s+4), int(t2+4), src) + 4
+						if ml1 > l {
+							t = t2
+							l = ml1
+							break
+						}
+					}
+					break
+				}
+				// Current value did not match, but check if previous long value does.
+				t = lCandidate.Prev.offset - e.cur
+				if s-t < maxMatchOffset && uint32(cv) == loadLE32(src, t) {
+					// Store the next match
+					e.table[nextHashS] = tableEntry{offset: nextS + e.cur}
+					eLong := &e.bTable[nextHashL]
+					eLong.Cur, eLong.Prev = tableEntry{offset: nextS + e.cur}, eLong.Cur
+					break
+				}
+			}
+
+			t = sCandidate.offset - e.cur
+			if s-t < maxMatchOffset && uint32(cv) == loadLE32(src, t) {
+				// Found a 4 match...
+				l = e.matchLenLimited(int(s+4), int(t+4), src) + 4
+
+				// Look up next long candidate (at nextS)
+				lCandidate = e.bTable[nextHashL]
+
+				// Store the next match
+				e.table[nextHashS] = tableEntry{offset: nextS + e.cur}
+				eLong := &e.bTable[nextHashL]
+				eLong.Cur, eLong.Prev = tableEntry{offset: nextS + e.cur}, eLong.Cur
+
+				// Check repeat at s + repOff
+				const repOff = 1
+				t2 := s - repeat + repOff
+				if loadLE32(src, t2) == uint32(cv>>(8*repOff)) {
+					ml := e.matchLenLimited(int(s+4+repOff), int(t2+4), src) + 4
+					if ml > l {
+						t = t2
+						l = ml
+						s += repOff
+						// Not worth checking more.
+						break
+					}
+				}
+
+				// If the next long is a candidate, use that...
+				t2 = lCandidate.Cur.offset - e.cur
+				if nextS-t2 < maxMatchOffset {
+					if loadLE32(src, t2) == uint32(next) {
+						ml := e.matchLenLimited(int(nextS+4), int(t2+4), src) + 4
+						if ml > l {
+							t = t2
+							s = nextS
+							l = ml
+							// This is ok, but check previous as well.
+						}
+					}
+					// If the previous long is a candidate, use that...
+					t2 = lCandidate.Prev.offset - e.cur
+					if nextS-t2 < maxMatchOffset && loadLE32(src, t2) == uint32(next) {
+						ml := e.matchLenLimited(int(nextS+4), int(t2+4), src) + 4
+						if ml > l {
+							t = t2
+							s = nextS
+							l = ml
+							break
+						}
+					}
+				}
+				break
+			}
+			cv = next
+		}
+
+		// Extend the 4-byte match as long as possible.
+		if l == 0 {
+			l = e.matchlenLong(int(s+4), int(t+4), src) + 4
+		} else if l == maxMatchLength {
+			l += e.matchlenLong(int(s+l), int(t+l), src)
+		}
+
+		// Try to locate a better match by checking the end-of-match...
+		if sAt := s + l; sAt < sLimit {
+			// Allow some bytes at the beginning to mismatch.
+			// Sweet spot is 2/3 bytes depending on input.
+			// 3 is only a little better when it is but sometimes a lot worse.
+			// The skipped bytes are tested in extend backwards,
+			// and still picked up as part of the match if they do.
+			const skipBeginning = 2
+			eLong := &e.bTable[hash7(loadLE64(src, sAt), tableBits)]
+			// Test current
+			t2 := eLong.Cur.offset - e.cur - l + skipBeginning
+			s2 := s + skipBeginning
+			off := s2 - t2
+			if off < maxMatchOffset {
+				if off > 0 && t2 >= 0 {
+					if l2 := e.matchlenLong(int(s2), int(t2), src); l2 > l {
+						t = t2
+						l = l2
+						s = s2
+					}
+				}
+				// Test previous entry:
+				t2 = eLong.Prev.offset - e.cur - l + skipBeginning
+				off := s2 - t2
+				if off > 0 && off < maxMatchOffset && t2 >= 0 {
+					if l2 := e.matchlenLong(int(s2), int(t2), src); l2 > l {
+						t = t2
+						l = l2
+						s = s2
+					}
+				}
+			}
+		}
+
+		// Extend backwards
+		for t > 0 && s > nextEmit && src[t-1] == src[s-1] {
+			s--
+			t--
+			l++
+		}
+		if nextEmit < s {
+			for _, v := range src[nextEmit:s] {
+				dst.tokens[dst.n] = token(v)
+				dst.litHist[v]++
+				dst.n++
+			}
+		}
+
+		dst.AddMatchLong(l, uint32(s-t-baseMatchOffset))
+		repeat = s - t
+		s += l
+		nextEmit = s
+		if nextS >= s {
+			s = nextS + 1
+		}
+
+		if s >= sLimit {
+			// Index after match end.
+			for i := nextS + 1; i < int32(len(src))-8; i += 2 {
+				cv := loadLE64(src, i)
+				e.table[hashLen(cv, tableBits, hashShortBytes)] = tableEntry{offset: i + e.cur}
+				eLong := &e.bTable[hash7(cv, tableBits)]
+				eLong.Cur, eLong.Prev = tableEntry{offset: i + e.cur}, eLong.Cur
+			}
+			goto emitRemainder
+		}
+
+		// Store every long hash in-between and every second short.
+		for i := nextS + 1; i < s-1; i += 2 {
+			cv := loadLE64(src, i)
+			t := tableEntry{offset: i + e.cur}
+			t2 := tableEntry{offset: t.offset + 1}
+			eLong := &e.bTable[hash7(cv, tableBits)]
+			eLong2 := &e.bTable[hash7(cv>>8, tableBits)]
+			e.table[hashLen(cv, tableBits, hashShortBytes)] = t
+			eLong.Cur, eLong.Prev = t, eLong.Cur
+			eLong2.Cur, eLong2.Prev = t2, eLong2.Cur
+		}
+		cv = loadLE64(src, s)
+	}
+
+emitRemainder:
+	if int(nextEmit) < len(src) {
+		// If nothing was added, don't encode literals.
+		if dst.n == 0 {
+			return
+		}
+
+		emitLiterals(dst, src[nextEmit:])
+	}
+}
diff --git a/src/compress/flate/regmask_amd64.go b/src/compress/flate/regmask_amd64.go
new file mode 100644
index 0000000..cd1469a
--- /dev/null
+++ b/src/compress/flate/regmask_amd64.go
@@ -0,0 +1,14 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+const (
+	// Masks for shifts with register sizes of the shift value.
+	// This can be used to work around the x86 design of shifting by mod register size.
+	// It can be used when a variable shift is always smaller than the register size.
+
+	// reg8SizeMask64 - shift value is 8 bits on 64 bit register.
+	reg8SizeMask64 = 63
+)
diff --git a/src/compress/flate/regmask_other.go b/src/compress/flate/regmask_other.go
new file mode 100644
index 0000000..e25fc87
--- /dev/null
+++ b/src/compress/flate/regmask_other.go
@@ -0,0 +1,18 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !amd64
+// +build !amd64
+
+package flate
+
+const (
+	// Masks for shifts with register sizes of the shift value.
+	// This can be used to work around the x86 design of shifting by mod register size.
+	// On other platforms the mask is ineffective so the AND can be removed by the compiler.
+	// It can be used when a variable shift is always smaller than the register size.
+
+	// reg8SizeMask64 - shift value is 8 bits on 64 bit register.
+	reg8SizeMask64 = 0xff
+)
diff --git a/src/compress/flate/testdata/huffman-null-max.sync.expect b/src/compress/flate/testdata/huffman-null-max.sync.expect
new file mode 100644
index 0000000..c081651
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-null-max.sync.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-null-max.sync.expect-noinput b/src/compress/flate/testdata/huffman-null-max.sync.expect-noinput
new file mode 100644
index 0000000..c081651
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-null-max.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-pi.sync.expect b/src/compress/flate/testdata/huffman-pi.sync.expect
new file mode 100644
index 0000000..e4396ac
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-pi.sync.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-pi.sync.expect-noinput b/src/compress/flate/testdata/huffman-pi.sync.expect-noinput
new file mode 100644
index 0000000..e4396ac
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-pi.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-rand-1k.dyn.expect-noinput b/src/compress/flate/testdata/huffman-rand-1k.dyn.expect-noinput
index 0c24742..016db55 100644
--- a/src/compress/flate/testdata/huffman-rand-1k.dyn.expect-noinput
+++ b/src/compress/flate/testdata/huffman-rand-1k.dyn.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-rand-1k.sync.expect b/src/compress/flate/testdata/huffman-rand-1k.sync.expect
new file mode 100644
index 0000000..09dc798
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-rand-1k.sync.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-rand-1k.sync.expect-noinput b/src/compress/flate/testdata/huffman-rand-1k.sync.expect-noinput
new file mode 100644
index 0000000..0c24742
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-rand-1k.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-rand-limit.dyn.expect b/src/compress/flate/testdata/huffman-rand-limit.dyn.expect
index 2d65279..881e59c 100644
--- a/src/compress/flate/testdata/huffman-rand-limit.dyn.expect
+++ b/src/compress/flate/testdata/huffman-rand-limit.dyn.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-rand-limit.dyn.expect-noinput b/src/compress/flate/testdata/huffman-rand-limit.dyn.expect-noinput
index 2d65279..881e59c 100644
--- a/src/compress/flate/testdata/huffman-rand-limit.dyn.expect-noinput
+++ b/src/compress/flate/testdata/huffman-rand-limit.dyn.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-rand-limit.golden b/src/compress/flate/testdata/huffman-rand-limit.golden
index 57e5932..9ca0eb1 100644
--- a/src/compress/flate/testdata/huffman-rand-limit.golden
+++ b/src/compress/flate/testdata/huffman-rand-limit.golden
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-rand-limit.sync.expect b/src/compress/flate/testdata/huffman-rand-limit.sync.expect
new file mode 100644
index 0000000..881e59c
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-rand-limit.sync.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-rand-limit.sync.expect-noinput b/src/compress/flate/testdata/huffman-rand-limit.sync.expect-noinput
new file mode 100644
index 0000000..881e59c
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-rand-limit.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-shifts.sync.expect b/src/compress/flate/testdata/huffman-shifts.sync.expect
new file mode 100644
index 0000000..7812c1c
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-shifts.sync.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-shifts.sync.expect-noinput b/src/compress/flate/testdata/huffman-shifts.sync.expect-noinput
new file mode 100644
index 0000000..7812c1c
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-shifts.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-text-shift.sync.expect b/src/compress/flate/testdata/huffman-text-shift.sync.expect
new file mode 100644
index 0000000..71ce3ae
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-text-shift.sync.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-text-shift.sync.expect-noinput b/src/compress/flate/testdata/huffman-text-shift.sync.expect-noinput
new file mode 100644
index 0000000..71ce3ae
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-text-shift.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-text.sync.expect b/src/compress/flate/testdata/huffman-text.sync.expect
new file mode 100644
index 0000000..d448727
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-text.sync.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-text.sync.expect-noinput b/src/compress/flate/testdata/huffman-text.sync.expect-noinput
new file mode 100644
index 0000000..d448727
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-text.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-zero.dyn.expect b/src/compress/flate/testdata/huffman-zero.dyn.expect
index 830348a..dbe401c 100644
--- a/src/compress/flate/testdata/huffman-zero.dyn.expect
+++ b/src/compress/flate/testdata/huffman-zero.dyn.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-zero.dyn.expect-noinput b/src/compress/flate/testdata/huffman-zero.dyn.expect-noinput
index 830348a..dbe401c 100644
--- a/src/compress/flate/testdata/huffman-zero.dyn.expect-noinput
+++ b/src/compress/flate/testdata/huffman-zero.dyn.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-zero.sync.expect b/src/compress/flate/testdata/huffman-zero.sync.expect
new file mode 100644
index 0000000..dbe401c
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-zero.sync.expect
Binary files differ
diff --git a/src/compress/flate/testdata/huffman-zero.sync.expect-noinput b/src/compress/flate/testdata/huffman-zero.sync.expect-noinput
new file mode 100644
index 0000000..dbe401c
--- /dev/null
+++ b/src/compress/flate/testdata/huffman-zero.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/testdata/null-long-match.sync.expect-noinput b/src/compress/flate/testdata/null-long-match.sync.expect-noinput
new file mode 100644
index 0000000..8b92d9f
--- /dev/null
+++ b/src/compress/flate/testdata/null-long-match.sync.expect-noinput
Binary files differ
diff --git a/src/compress/flate/token.go b/src/compress/flate/token.go
index fc0e494..3f0d1c3 100644
--- a/src/compress/flate/token.go
+++ b/src/compress/flate/token.go
@@ -4,20 +4,26 @@
 
 package flate
 
+import (
+	"math"
+)
+
 const (
-	// 2 bits:   type   0 = literal  1=EOF  2=Match   3=Unused
-	// 8 bits:   xlength = length - MIN_MATCH_LENGTH
-	// 22 bits   xoffset = offset - MIN_OFFSET_SIZE, or literal
-	lengthShift = 22
-	offsetMask  = 1<<lengthShift - 1
-	typeMask    = 3 << 30
-	literalType = 0 << 30
-	matchType   = 1 << 30
+	// Token is a compound value:
+	// bits 0-16  	xoffset = offset - MIN_OFFSET_SIZE, or literal - 16 bits
+	// bits 16-22	offsetcode - 5 bits
+	// bits 22-30   xlength = length - MIN_MATCH_LENGTH - 8 bits
+	// bits 30-32   type   0 = literal  1=EOF  2=Match   3=Unused - 2 bits
+	lengthShift         = 22
+	offsetMask          = 1<<lengthShift - 1
+	typeMask            = 3 << 30
+	matchType           = 1 << 30
+	matchOffsetOnlyMask = 0xffff
 )
 
 // The length code for length X (MIN_MATCH_LENGTH <= X <= MAX_MATCH_LENGTH)
 // is lengthCodes[length - MIN_MATCH_LENGTH]
-var lengthCodes = [...]uint32{
+var lengthCodes = [256]uint8{
 	0, 1, 2, 3, 4, 5, 6, 7, 8, 8,
 	9, 9, 10, 10, 11, 11, 12, 12, 12, 12,
 	13, 13, 13, 13, 14, 14, 14, 14, 15, 15,
@@ -46,7 +52,37 @@
 	27, 27, 27, 27, 27, 28,
 }
 
-var offsetCodes = [...]uint32{
+// lengthCodes1 is length codes, but starting at 1.
+var lengthCodes1 = [256]uint8{
+	1, 2, 3, 4, 5, 6, 7, 8, 9, 9,
+	10, 10, 11, 11, 12, 12, 13, 13, 13, 13,
+	14, 14, 14, 14, 15, 15, 15, 15, 16, 16,
+	16, 16, 17, 17, 17, 17, 17, 17, 17, 17,
+	18, 18, 18, 18, 18, 18, 18, 18, 19, 19,
+	19, 19, 19, 19, 19, 19, 20, 20, 20, 20,
+	20, 20, 20, 20, 21, 21, 21, 21, 21, 21,
+	21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
+	22, 22, 22, 22, 22, 22, 22, 22, 22, 22,
+	22, 22, 22, 22, 22, 22, 23, 23, 23, 23,
+	23, 23, 23, 23, 23, 23, 23, 23, 23, 23,
+	23, 23, 24, 24, 24, 24, 24, 24, 24, 24,
+	24, 24, 24, 24, 24, 24, 24, 24, 25, 25,
+	25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
+	25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
+	25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
+	26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
+	26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
+	26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
+	26, 26, 27, 27, 27, 27, 27, 27, 27, 27,
+	27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
+	27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
+	27, 27, 27, 27, 28, 28, 28, 28, 28, 28,
+	28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
+	28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
+	28, 28, 28, 28, 28, 29,
+}
+
+var offsetCodes = [256]uint32{
 	0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
 	8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9,
 	10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
@@ -65,33 +101,198 @@
 	15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
 }
 
-type token uint32
-
-// Convert a literal into a literal token.
-func literalToken(literal uint32) token { return token(literalType + literal) }
-
-// Convert a < xlength, xoffset > pair into a match token.
-func matchToken(xlength uint32, xoffset uint32) token {
-	return token(matchType + xlength<<lengthShift + xoffset)
+// offsetCodes14 are offsetCodes, but with 14 added.
+var offsetCodes14 = [256]uint32{
+	14, 15, 16, 17, 18, 18, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21,
+	22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23,
+	24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
+	25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
+	26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
+	26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26,
+	27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
+	27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
+	28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
+	28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
+	28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
+	28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28,
+	29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29,
+	29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29,
+	29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29,
+	29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29,
 }
 
-// Returns the literal of a literal token.
-func (t token) literal() uint32 { return uint32(t - literalType) }
+type token uint32
 
-// Returns the extra offset of a match token.
+// tokens are compound values as described above.
+// Histograms are created as tokens are added.
+// A full block is allocated.
+type tokens struct {
+	extraHist [32]uint16  // codes 256->maxnumlit
+	offHist   [32]uint16  // offset codes
+	litHist   [256]uint16 // codes 0->255
+	nFilled   int
+	n         uint16 // Must be able to contain maxStoreBlockSize
+	tokens    [65536]token
+}
+
+func (t *tokens) Reset() {
+	if t.n == 0 {
+		return
+	}
+	t.n = 0
+	t.nFilled = 0
+	clear(t.litHist[:])
+	clear(t.extraHist[:])
+	clear(t.offHist[:])
+}
+
+func indexTokens(in []token) tokens {
+	var t tokens
+	t.indexTokens(in)
+	return t
+}
+
+func (t *tokens) indexTokens(in []token) {
+	t.Reset()
+	for _, tok := range in {
+		if tok < matchType {
+			t.AddLiteral(tok.literal())
+			continue
+		}
+		t.AddMatch(uint32(tok.length()), tok.offset()&matchOffsetOnlyMask)
+	}
+}
+
+// emitLiterals writes a literal chunk and returns the number of bytes written.
+func emitLiterals(dst *tokens, lit []byte) {
+	for _, v := range lit {
+		dst.tokens[dst.n] = token(v)
+		dst.litHist[v]++
+		dst.n++
+	}
+}
+
+func (t *tokens) AddLiteral(lit byte) {
+	t.tokens[t.n] = token(lit)
+	t.litHist[lit]++
+	t.n++
+}
+
+// from https://stackoverflow.com/a/28730362
+func mFastLog2(val float32) float32 {
+	ux := int32(math.Float32bits(val))
+	log2 := (float32)(((ux >> 23) & 255) - 128)
+	ux &= -0x7f800001
+	ux += 127 << 23
+	uval := math.Float32frombits(uint32(ux))
+	log2 += ((-0.34484843)*uval+2.02466578)*uval - 0.67487759
+	return log2
+}
+
+// EstimatedBits will return a minimum size estimated by an *optimal*
+// compression of the block.
+func (t *tokens) EstimatedBits() int {
+	shannon := float32(0)
+	bits := int(0)
+	nMatches := 0
+	total := int(t.n) + t.nFilled
+	if total > 0 {
+		invTotal := 1.0 / float32(total)
+		for _, v := range t.litHist[:] {
+			if v > 0 {
+				n := float32(v)
+				shannon += atLeastOne(-mFastLog2(n*invTotal)) * n
+			}
+		}
+		// Just add 15 for EOB
+		shannon += 15
+		for i, v := range t.extraHist[1 : literalCount-256] {
+			if v > 0 {
+				n := float32(v)
+				shannon += atLeastOne(-mFastLog2(n*invTotal)) * n
+				bits += int(lengthExtraBits[i&31]) * int(v)
+				nMatches += int(v)
+			}
+		}
+	}
+	if nMatches > 0 {
+		invTotal := 1.0 / float32(nMatches)
+		for i, v := range t.offHist[:offsetCodeCount] {
+			if v > 0 {
+				n := float32(v)
+				shannon += atLeastOne(-mFastLog2(n*invTotal)) * n
+				bits += int(offsetExtraBits[i&31]) * int(v)
+			}
+		}
+	}
+	return int(shannon) + bits
+}
+
+// AddMatch adds a match to the tokens.
+// This function is very sensitive to inlining and right on the border.
+func (t *tokens) AddMatch(xlength uint32, xoffset uint32) {
+	oCode := offsetCode(xoffset)
+	xoffset |= oCode << 16
+
+	t.extraHist[lengthCodes1[uint8(xlength)]]++
+	t.offHist[oCode&31]++
+	t.tokens[t.n] = token(matchType | xlength<<lengthShift | xoffset)
+	t.n++
+}
+
+// AddMatchLong adds a match to the tokens, potentially longer than max match length.
+// Length should NOT have the base subtracted, only offset should.
+func (t *tokens) AddMatchLong(xlength int32, xoffset uint32) {
+	oc := offsetCode(xoffset)
+	xoffset |= oc << 16
+	for xlength > 0 {
+		xl := xlength
+		if xl > 258 {
+			// We need to have at least baseMatchLength left over for next loop.
+			if xl > 258+baseMatchLength {
+				xl = 258
+			} else {
+				xl = 258 - baseMatchLength
+			}
+		}
+		xlength -= xl
+		xl -= baseMatchLength
+		t.extraHist[lengthCodes1[uint8(xl)]]++
+		t.offHist[oc&31]++
+		t.tokens[t.n] = token(matchType | uint32(xl)<<lengthShift | xoffset)
+		t.n++
+	}
+}
+
+func (t *tokens) AddEOB() {
+	t.tokens[t.n] = token(endBlockMarker)
+	t.extraHist[0]++
+	t.n++
+}
+
+// Slice returns a slice of the tokens that references the tokens in t.
+func (t *tokens) Slice() []token {
+	return t.tokens[:t.n]
+}
+
+// Returns the type of a token
+func (t token) typ() uint32 { return uint32(t) & typeMask }
+
+// Returns the literal of a literal token
+func (t token) literal() uint8 { return uint8(t) }
+
+// Returns the extra offset of a match token
 func (t token) offset() uint32 { return uint32(t) & offsetMask }
 
-func (t token) length() uint32 { return uint32((t - matchType) >> lengthShift) }
+func (t token) length() uint8 { return uint8(t >> lengthShift) }
 
-func lengthCode(len uint32) uint32 { return lengthCodes[len] }
+// Convert length to code.
+func lengthCode(len uint8) uint8 { return lengthCodes[len] }
 
-// Returns the offset code corresponding to a specific offset.
+// Returns the offset code corresponding to a specific offset
 func offsetCode(off uint32) uint32 {
 	if off < uint32(len(offsetCodes)) {
-		return offsetCodes[off]
+		return offsetCodes[uint8(off)]
 	}
-	if off>>7 < uint32(len(offsetCodes)) {
-		return offsetCodes[off>>7] + 14
-	}
-	return offsetCodes[off>>14] + 28
+	return offsetCodes14[uint8(off>>7)]
 }
diff --git a/src/compress/flate/unsafe_disabled.go b/src/compress/flate/unsafe_disabled.go
new file mode 100644
index 0000000..1444494
--- /dev/null
+++ b/src/compress/flate/unsafe_disabled.go
@@ -0,0 +1,33 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package flate
+
+import (
+	"internal/byteorder"
+)
+
+type indexer interface {
+	int | int8 | int16 | int32 | int64 | uint | uint8 | uint16 | uint32 | uint64
+}
+
+// loadLE8 will load from b at index i.
+func loadLE8[I indexer](b []byte, i I) byte {
+	return b[i]
+}
+
+// loadLE32 will load from b at index i.
+func loadLE32[I indexer](b []byte, i I) uint32 {
+	return byteorder.LEUint32(b[i:])
+}
+
+// loadLE64 will load from b at index i.
+func loadLE64[I indexer](b []byte, i I) uint64 {
+	return byteorder.LEUint64(b[i:])
+}
+
+// storeLE64 will store v at start of b.
+func storeLE64(b []byte, v uint64) {
+	byteorder.LEPutUint64(b, v)
+}
diff --git a/src/compress/flate/writer_test.go b/src/compress/flate/writer_test.go
index c413735..43815b2 100644
--- a/src/compress/flate/writer_test.go
+++ b/src/compress/flate/writer_test.go
@@ -8,6 +8,7 @@
 	"bytes"
 	"fmt"
 	"io"
+	"math"
 	"math/rand"
 	"runtime"
 	"testing"
@@ -40,6 +41,34 @@
 	})
 }
 
+func TestWriterMemUsage(t *testing.T) {
+	testMem := func(t *testing.T, fn func()) {
+		var before, after runtime.MemStats
+		runtime.GC()
+		runtime.ReadMemStats(&before)
+		fn()
+		runtime.GC()
+		runtime.ReadMemStats(&after)
+		t.Logf("%s: Memory Used: %dKB, %d allocs", t.Name(), (after.HeapInuse-before.HeapInuse)/1024, after.HeapObjects-before.HeapObjects)
+	}
+	data := make([]byte, 100000)
+
+	for level := HuffmanOnly; level <= BestCompression; level++ {
+		t.Run(fmt.Sprint("level-", level), func(t *testing.T) {
+			var zr *Writer
+			var err error
+			testMem(t, func() {
+				zr, err = NewWriter(io.Discard, level)
+				if err != nil {
+					t.Fatal(err)
+				}
+				zr.Write(data)
+			})
+			zr.Close()
+		})
+	}
+}
+
 // errorWriter is a writer that fails after N writes.
 type errorWriter struct {
 	N int
@@ -67,7 +96,7 @@
 	in := buf.Bytes()
 	// We create our own buffer to control number of writes.
 	copyBuffer := make([]byte, 128)
-	for l := 0; l < 10; l++ {
+	for l := range 10 {
 		for fail := 1; fail <= 256; fail *= 2 {
 			// Fail after 'fail' writes
 			ew := &errorWriter{N: fail}
@@ -110,6 +139,75 @@
 	}
 }
 
+// Test if errors from the underlying writer is passed upwards.
+func TestWriter_Reset(t *testing.T) {
+	buf := new(bytes.Buffer)
+	n := 65536
+	if !testing.Short() {
+		n *= 4
+	}
+	for i := 0; i < n; i++ {
+		fmt.Fprintf(buf, "asdasfasf%d%dfghfgujyut%dyutyu\n", i, i, i)
+	}
+	in := buf.Bytes()
+	for l := range 10 {
+		l := l
+		if testing.Short() && l > 1 {
+			continue
+		}
+		t.Run(fmt.Sprintf("level-%d", l), func(t *testing.T) {
+			t.Parallel()
+			offset := 1
+			if testing.Short() {
+				offset = 256
+			}
+			for ; offset <= 256; offset *= 2 {
+				// Fail after 'fail' writes
+				w, err := NewWriter(io.Discard, l)
+				if err != nil {
+					t.Fatalf("NewWriter: level %d: %v", l, err)
+				}
+				if w.d.fast == nil {
+					t.Skip("Not Fast...")
+					return
+				}
+				for i := 0; i < (bufferReset-len(in)-offset-maxMatchOffset)/maxMatchOffset; i++ {
+					// skip ahead to where we are close to wrap around...
+					w.d.fast.Reset()
+				}
+				w.d.fast.Reset()
+				_, err = w.Write(in)
+				if err != nil {
+					t.Fatal(err)
+				}
+				for range 50 {
+					// skip ahead again... This should wrap around...
+					w.d.fast.Reset()
+				}
+				w.d.fast.Reset()
+
+				_, err = w.Write(in)
+				if err != nil {
+					t.Fatal(err)
+				}
+				for range (math.MaxUint32 - bufferReset) / maxMatchOffset {
+					// skip ahead to where we are close to wrap around...
+					w.d.fast.Reset()
+				}
+
+				_, err = w.Write(in)
+				if err != nil {
+					t.Fatal(err)
+				}
+				err = w.Close()
+				if err != nil {
+					t.Fatal(err)
+				}
+			}
+		})
+	}
+}
+
 // Test if two runs produce identical results
 // even when writing different sizes to the Writer.
 func TestDeterministic(t *testing.T) {
@@ -171,6 +269,24 @@
 	if !bytes.Equal(b1b, b2b) {
 		t.Errorf("level %d did not produce deterministic result, result mismatch, len(a) = %d, len(b) = %d", i, len(b1b), len(b2b))
 	}
+
+	// Test using io.WriterTo interface.
+	var b3 bytes.Buffer
+	br = bytes.NewBuffer(t1)
+	w, err = NewWriter(&b3, i)
+	if err != nil {
+		t.Fatal(err)
+	}
+	_, err = br.WriteTo(w)
+	if err != nil {
+		t.Fatal(err)
+	}
+	w.Close()
+
+	b3b := b3.Bytes()
+	if !bytes.Equal(b1b, b3b) {
+		t.Errorf("level %d (io.WriterTo) did not produce deterministic result, result mismatch, len(a) = %d, len(b) = %d", i, len(b1b), len(b3b))
+	}
 }
 
 // TestDeflateFast_Reset will test that encoding is consistent

Change information

Files:

M src/compress/flate/deflate.go
M src/compress/flate/deflate_test.go
M src/compress/flate/deflatefast.go
M src/compress/flate/dict_decoder.go
M src/compress/flate/example_test.go
A src/compress/flate/fuzz_test.go
M src/compress/flate/huffman_bit_writer.go
M src/compress/flate/huffman_bit_writer_test.go
M src/compress/flate/huffman_code.go
A src/compress/flate/huffman_sortByFreq.go
A src/compress/flate/huffman_sortByLiteral.go
A src/compress/flate/level1.go
A src/compress/flate/level2.go
A src/compress/flate/level3.go
A src/compress/flate/level4.go
A src/compress/flate/level5.go
A src/compress/flate/level6.go
A src/compress/flate/regmask_amd64.go
A src/compress/flate/regmask_other.go
A src/compress/flate/testdata/huffman-null-max.sync.expect
A src/compress/flate/testdata/huffman-null-max.sync.expect-noinput
A src/compress/flate/testdata/huffman-pi.sync.expect
A src/compress/flate/testdata/huffman-pi.sync.expect-noinput
M src/compress/flate/testdata/huffman-rand-1k.dyn.expect-noinput
A src/compress/flate/testdata/huffman-rand-1k.sync.expect
A src/compress/flate/testdata/huffman-rand-1k.sync.expect-noinput
M src/compress/flate/testdata/huffman-rand-limit.dyn.expect
M src/compress/flate/testdata/huffman-rand-limit.dyn.expect-noinput
M src/compress/flate/testdata/huffman-rand-limit.golden
A src/compress/flate/testdata/huffman-rand-limit.sync.expect
A src/compress/flate/testdata/huffman-rand-limit.sync.expect-noinput
A src/compress/flate/testdata/huffman-shifts.sync.expect
A src/compress/flate/testdata/huffman-shifts.sync.expect-noinput
A src/compress/flate/testdata/huffman-text-shift.sync.expect
A src/compress/flate/testdata/huffman-text-shift.sync.expect-noinput
A src/compress/flate/testdata/huffman-text.sync.expect
A src/compress/flate/testdata/huffman-text.sync.expect-noinput
M src/compress/flate/testdata/huffman-zero.dyn.expect
M src/compress/flate/testdata/huffman-zero.dyn.expect-noinput
A src/compress/flate/testdata/huffman-zero.sync.expect
A src/compress/flate/testdata/huffman-zero.sync.expect-noinput
A src/compress/flate/testdata/null-long-match.sync.expect-noinput
M src/compress/flate/token.go
A src/compress/flate/unsafe_disabled.go
M src/compress/flate/writer_test.go

Change size: XL

Delta: 45 files changed, 3908 insertions(+), 1553 deletions(-)

Open in Gerrit

Related details

Attention set is empty

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Gopher Robot (Gerrit)

unread,

Sep 27, 2025, 8:48:49 AM9/27/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Gopher Robot added 1 comment

Patchset-level comments

File-level comment, Patchset 1 (Latest):

Gopher Robot . unresolved

I spotted some possible problems with your PR:

  1. You have a long 82 character line in the commit message body. Please add line breaks to long lines that should be wrapped. Lines in the commit message body should be wrapped at ~76 characters unless needed for things like URLs or tables. (Note: GitHub might render long lines as soft-wrapped, so double-check in the Gerrit commit message shown above.)
  2. It looks like you have a properly formated bug reference, but the convention is to put bug references at the bottom of the commit message, even if a bug is also mentioned in the body of the message.

Please address any problems by updating the GitHub PR.

When complete, mark this comment as 'Done' and click the [blue 'Reply' button](https://go.dev/wiki/GerritBot#i-left-a-reply-to-a-comment-in-gerrit-but-no-one-but-me-can-see-it) above. These findings are based on heuristics; if a finding does not apply, briefly reply here saying so.

To update the commit title or commit message body shown here in Gerrit, you must edit the GitHub PR title and PR description (the first comment) in the GitHub web interface using the 'Edit' button or 'Edit' menu entry there. Note: pushing a new commit to the PR will not automatically update the commit message used by Gerrit.

For more details, see:

[how to update commit messages](https://go.dev/wiki/GerritBot/#how-does-gerritbot-determine-the-final-commit-message) for PRs imported into Gerrit.
the Go project's [conventions for commit messages](https://go.dev/doc/contribute#commit_messages) that you should follow.

(In general for Gerrit code reviews, the change author is expected to [log in to Gerrit](https://go-review.googlesource.com/login/) with a Gmail or other Google account and then close out each piece of feedback by marking it as 'Done' if implemented as suggested or otherwise reply to each review comment. See the [Review](https://go.dev/doc/contribute#review) section of the Contributing Guide for details.)

Open in Gerrit

Related details

Attention set is empty

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Jorropo (Gerrit)

unread,

Sep 27, 2025, 8:55:45 AM9/27/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Gopher Robot, golang-co...@googlegroups.com

Jorropo voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention set is empty

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Sep 27, 2025, 8:56:02 AM9/27/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Jorropo

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #2 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Jorropo (Gerrit)

unread,

Sep 27, 2025, 9:13:15 AM9/27/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Gopher Robot, golang-co...@googlegroups.com

Jorropo added 2 comments

File src/compress/flate/deflatefast.go

Line 99, Patchset 2 (Latest): return uint32(((u << (64 - 56)) * prime7bytes) >> ((64 - h) & reg8SizeMask64))

Jorropo . unresolved

This function is always called with `h: tableBits` which is hardcoded `15`.

It is also small enough to be inlined, that means the RHS is always folded and the `& 63` is a no-op. Please remove `& reg8SizeMask64`.

Also maybe it doesn't make sense to take `h` as an arg given literally everyone call it with the same `const`.

File src/compress/flate/huffman_bit_writer.go

Line 416, Patchset 2 (Latest): // The function does not get inlined if we "& 63" the shift.

Jorropo . unresolved

I don't understand this comment, `(c.code64() << (w.nbits & reg8SizeMask64)) & 63` the shift doesn't look like it would implement what you need to implement.

Assuming you meant `w.nbits & 63` the comment disagree with the code doesn't it ?
Or more exactly why point out that one operator is making this function over the inline budget ?

Open in Gerrit

Related details

Attention set is empty

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Sep 27, 2025, 10:29:09 AM9/27/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Jorropo

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #3 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Jes Cok (Gerrit)

unread,

Sep 27, 2025, 10:34:35 AM9/27/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Jorropo

Jes Cok voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Sep 27, 2025, 10:59:02 AM9/27/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Jes Cok and Jorropo

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #4 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Jes Cok
Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Jes Cok (Gerrit)

unread,

Sep 27, 2025, 11:01:50 AM9/27/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Jorropo

Jes Cok voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Sep 27, 2025, 11:04:04 AM9/27/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Jes Cok, Go LUCI, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Jorropo

Klaus Post added 3 comments

Patchset-level comments

File-level comment, Patchset 4 (Latest):

Klaus Post . resolved

Fixed up some test stuff. Not sure about TestIssue59208, though. It uses zlib to inject some data, but doesn't seem to adjust to the output length.

File src/compress/flate/deflatefast.go

Line 99, Patchset 2: return uint32(((u << (64 - 56)) * prime7bytes) >> ((64 - h) & reg8SizeMask64))

Jorropo . resolved

This function is always called with `h: tableBits` which is hardcoded `15`.
It is also small enough to be inlined, that means the RHS is always folded and the `& 63` is a no-op. Please remove `& reg8SizeMask64`.
Also maybe it doesn't make sense to take `h` as an arg given literally everyone call it with the same `const`.

Klaus Post

I will just remove it and use hashLen - this is a leftover from when hashLen wouldn't be inlined.

File src/compress/flate/huffman_bit_writer.go

Line 416, Patchset 2: // The function does not get inlined if we "& 63" the shift.

Jorropo . resolved

I don't understand this comment, `(c.code64() << (w.nbits & reg8SizeMask64)) & 63` the shift doesn't look like it would implement what you need to implement.
Assuming you meant `w.nbits & 63` the comment disagree with the code doesn't it ?
Or more exactly why point out that one operator is making this function over the inline budget ?

Klaus Post

It is a "note-to-self" to not use this for performance critical stuff. I will make it more clear.

Open in Gerrit

Related details

Attention is currently required from:

Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Sep 27, 2025, 11:06:40 AM9/27/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Jes Cok and Jorropo

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #5 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Jes Cok
Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Sep 28, 2025, 11:23:44 AM9/28/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Jes Cok and Jorropo

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #6 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Jes Cok
Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Sean Liao (Gerrit)

unread,

Sep 28, 2025, 12:03:52 PM9/28/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Jes Cok, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Jes Cok, Jorropo and Klaus Post

Sean Liao added 22 comments

Patchset-level comments

File-level comment, Patchset 5:

Sean Liao . resolved

A first pass through the "easy" parts to review. I'll look at the dense code later.

Commit Message

Line 63, Patchset 5:benchmark old ns/op new ns/op delta

Sean Liao . unresolved

This doesn't quite look like benchstat output?
https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

Line 148, Patchset 5:Change-Id: Icb4c9839dc8f1bb96fd6d548038679a7554a559b

Sean Liao . unresolved

duplicate change id?

File src/compress/flate/deflate_test.go

Line 86, Patchset 5: if len(y) >= minMatchLength {

Sean Liao . unresolved

undo these changes?

Line 128, Patchset 5 (Parent):func TestWriterClose(t *testing.T) {

Sean Liao . unresolved

any reason to drop this test?

Line 437, Patchset 5: for l := 4; l < 9; l++ {

Sean Liao . unresolved

I think for a lot of tests, if you're going to run them over multiple levels,
you might as well wrap them in subtests.

```
for l := ... {
  t.Run(fmt,Sprintf("level=%d", l), func(t *testing.T) { // or lift the helper func out
```

Line 460, Patchset 5:// See http://code.google.com/p/go/issues/detail?id=2508

Sean Liao . unresolved

use the go.dev/issue/2508 format

Line 628, Patchset 5 (Parent): t.Parallel()

Sean Liao . unresolved

is it unsafe for parallel? same question for the other places t.Parallel was removed.

Line 706, Patchset 5 (Parent):

Sean Liao . unresolved

comment on why all these were dropped?

File src/compress/flate/fuzz_test.go

Line 17, Patchset 5:

func TestMain(m *testing.M) {
	flag.Parse()
	os.Exit(m.Run())
}

Sean Liao . unresolved

I don't think this is necessary, TestMain doesn't need flags, by the time the test is called, flags will have been parsed.

Line 55, Patchset 5: t.Fatal(msg + "short write")

Sean Liao . unresolved

All these Fatal calls, should probably just do `t.Fatal(msg, "short write")` / `t.Fatal(msg, err)` for proper space joining

File src/compress/flate/huffman_bit_writer_test.go

Line 195, Patchset 5: const gotSuffix = ".got"

Sean Liao . resolved

Not sure why for this change

File src/compress/flate/huffman_code.go

Line 87, Patchset 5:}

Sean Liao . unresolved

can you reorder the types / methods before this line so that methods follow the type they're defined on.

Line 180, Patchset 5:// return An integer array in which array[i] indicates the number of literals

Sean Liao . unresolved

comment alignmnet looks strange, maybe go back to using a paragraph per argyment?

Line 375, Patchset 5: if len(b) >= 8<<10 {

Sean Liao . unresolved

1 << 13 might be easier read

Line 397, Patchset 5: y, z, w = y[:len(x)], z[:len(x)], w[:len(x)]

Sean Liao . unresolved

why not do the complete slice in one go? b[n:n*2], b[n*2:n*3], etc

Line 403, Patchset 5: *v0++

Sean Liao . resolved

i suppose there's a reason for this indirection?

File src/compress/flate/huffman_sortByFreq.go

Line 7, Patchset 5:// Sort sorts data.

Sean Liao . unresolved

i suppose there's a benchmark somewhere that shows it's faster than slices.SortFunc ?
same for sortByLiteral

File src/compress/flate/regmask_other.go

Line 6, Patchset 5:// +build !amd64

Sean Liao . unresolved

drop the old +build

File src/compress/flate/token.go

Line 14, Patchset 5: // bits 16-22 offsetcode - 5 bits

Sean Liao . unresolved

mixed space / tabs

File src/compress/flate/writer_test.go

Line 44, Patchset 5:func TestWriterMemUsage(t *testing.T) {

Sean Liao . unresolved

It doesn't look like this a test that can fail, should it be a benchmark instead?

Line 156, Patchset 5: continue

Sean Liao . unresolved

break?

Open in Gerrit

Related details

Attention is currently required from:

Jes Cok
Jorropo
Klaus Post

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Carlos Amedee (Gerrit)

unread,

Sep 30, 2025, 6:00:16 PM9/30/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Jorropo and Klaus Post

Carlos Amedee voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Jorropo
Klaus Post

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Sean Liao (Gerrit)

unread,

Oct 5, 2025, 8:07:58 AM10/5/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Klaus Post

Sean Liao added 7 comments

File src/compress/flate/deflate.go

Line 726, Patchset 6 (Latest): case level >= 1 && level <= 6:

Sean Liao . unresolved

```suggestion
	case 1 <= level && level <= 6:
```

Line 895, Patchset 6 (Latest):func (w *Writer) ResetDict(dst io.Writer, dict []byte) {

Sean Liao . unresolved

Unexport this, there's ano approved proposal for new api.

File src/compress/flate/deflatefast.go

Line 94, Patchset 6 (Latest): Prev tableEntry

Sean Liao . unresolved

lowercase

Line 134, Patchset 6 (Latest):func (e *fastGen) Reset() {

Sean Liao . unresolved

unexport

File src/compress/flate/huffman_bit_writer.go

Line 947, Patchset 6 (Latest):

	for i := range w.literalFreq[:] {
		w.literalFreq[i] = 0
	}
	if !w.lastHuffMan {


		for i := range w.offsetFreq[:] {

			w.offsetFreq[i] = 0
		}
	}

Sean Liao . unresolved

use the clear builtin

File src/compress/flate/huffman_sortByLiteral.go

Sean Liao . unresolved

new copyright years should be current

File src/compress/flate/level1.go

Line 23, Patchset 6 (Latest):

			for i := range e.table[:] {
				e.table[i] = tableEntry{}
			}

Sean Liao . unresolved

use clear

Open in Gerrit

Related details

Attention is currently required from:

Klaus Post

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Oct 17, 2025, 11:03:07 AM10/17/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Klaus Post and Sean Liao

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #7 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Klaus Post
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 17, 2025, 11:11:53 AM10/17/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Klaus Post added 27 comments

Patchset-level comments

File-level comment, Patchset 6:

Klaus Post . resolved

Thanks for great feedback! Most was stuff that had been fixed/added after the fork, so I brought it back.

Commit Message

Line 63, Patchset 5:benchmark old ns/op new ns/op delta

Sean Liao . resolved

This doesn't quite look like benchstat output?
https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

Klaus Post

Acknowledged

Line 148, Patchset 5:Change-Id: Icb4c9839dc8f1bb96fd6d548038679a7554a559b

Sean Liao . resolved

duplicate change id?

Klaus Post

Acknowledged

File src/compress/flate/deflate.go

Line 726, Patchset 6: case level >= 1 && level <= 6:

Sean Liao . resolved

```suggestion
	case 1 <= level && level <= 6:
```

Klaus Post

Acknowledged

Line 895, Patchset 6:func (w *Writer) ResetDict(dst io.Writer, dict []byte) {

Sean Liao . resolved

Unexport this, there's ano approved proposal for new api.

Klaus Post

Removing to not leave dead code.

File src/compress/flate/deflate_test.go

Line 86, Patchset 5: if len(y) >= minMatchLength {

Sean Liao . resolved

undo these changes?

Klaus Post

Acknowledged

Line 128, Patchset 5 (Parent):func TestWriterClose(t *testing.T) {

Sean Liao . resolved

any reason to drop this test?

Klaus Post

Acknowledged

Line 437, Patchset 5: for l := 4; l < 9; l++ {

Sean Liao . resolved

I think for a lot of tests, if you're going to run them over multiple levels,
you might as well wrap them in subtests.
```
for l := ... {
  t.Run(fmt,Sprintf("level=%d", l), func(t *testing.T) { // or lift the helper func out
```

Klaus Post

Acknowledged

Line 460, Patchset 5:// See http://code.google.com/p/go/issues/detail?id=2508

Sean Liao . resolved

use the go.dev/issue/2508 format

Klaus Post

Acknowledged

Line 628, Patchset 5 (Parent): t.Parallel()

Sean Liao . resolved

is it unsafe for parallel? same question for the other places t.Parallel was removed.

Klaus Post

Acknowledged

Line 706, Patchset 5 (Parent):

Sean Liao . resolved

comment on why all these were dropped?

Klaus Post

Acknowledged. Tests weren't ported. I have re-added them.

File src/compress/flate/deflatefast.go

Line 94, Patchset 6: Prev tableEntry

Sean Liao . resolved

lowercase

Klaus Post

Acknowledged

Line 134, Patchset 6:func (e *fastGen) Reset() {

Sean Liao . resolved

unexport

Klaus Post

Acknowledged

File src/compress/flate/fuzz_test.go

Line 17, Patchset 5:

func TestMain(m *testing.M) {
	flag.Parse()
	os.Exit(m.Run())
}

Sean Liao . resolved

I don't think this is necessary, TestMain doesn't need flags, by the time the test is called, flags will have been parsed.

Klaus Post

Acknowledged

Line 55, Patchset 5: t.Fatal(msg + "short write")

Sean Liao . resolved

All these Fatal calls, should probably just do `t.Fatal(msg, "short write")` / `t.Fatal(msg, err)` for proper space joining

Klaus Post

Acknowledged

File src/compress/flate/huffman_bit_writer.go

Line 947, Patchset 6: for i := range w.literalFreq[:] {


		w.literalFreq[i] = 0
	}
	if !w.lastHuffMan {
		for i := range w.offsetFreq[:] {
			w.offsetFreq[i] = 0
		}
	}

Sean Liao . resolved

use the clear builtin

Klaus Post

Acknowledged

File src/compress/flate/huffman_code.go

Line 87, Patchset 5:}

Sean Liao . resolved

can you reorder the types / methods before this line so that methods follow the type they're defined on.

Klaus Post

Acknowledged

Line 180, Patchset 5:// return An integer array in which array[i] indicates the number of literals

Sean Liao . resolved

comment alignmnet looks strange, maybe go back to using a paragraph per argyment?

Klaus Post

Acknowledged

Line 375, Patchset 5: if len(b) >= 8<<10 {

Sean Liao . unresolved

1 << 13 might be easier read

Klaus Post

I try to follow the x << 10 (for clear KiB), etc, whenever reasonable.

Line 397, Patchset 5: y, z, w = y[:len(x)], z[:len(x)], w[:len(x)]

Sean Liao . unresolved

why not do the complete slice in one go? b[n:n*2], b[n*2:n*3], etc

Klaus Post

To make it clear to the compiler that 'len(y, z, w) == len(x)', so there are no bounds checks. Last time the compiler couldn't infer it.

File src/compress/flate/huffman_sortByFreq.go

Line 7, Patchset 5:// Sort sorts data.

Sean Liao . unresolved

i suppose there's a benchmark somewhere that shows it's faster than slices.SortFunc ?
same for sortByLiteral

Klaus Post

I haven't tested it in a very long while. I had very significant improvement when added... https://github.com/klauspost/compress/pull/207#issuecomment-575072889 (numbers are for the full encode impact)

But that was replacing sort.Sort.

File src/compress/flate/huffman_sortByLiteral.go

Sean Liao . resolved

new copyright years should be current

Klaus Post

Acknowledged

File src/compress/flate/level1.go

Line 23, Patchset 6:

			for i := range e.table[:] {
				e.table[i] = tableEntry{}
			}

Sean Liao . resolved

use clear

Klaus Post

Acknowledged

File src/compress/flate/regmask_other.go

Line 6, Patchset 5:// +build !amd64

Sean Liao . resolved

drop the old +build

Klaus Post

Acknowledged

File src/compress/flate/token.go

Line 14, Patchset 5: // bits 16-22 offsetcode - 5 bits

Sean Liao . resolved

mixed space / tabs

Klaus Post

Acknowledged

File src/compress/flate/writer_test.go

Line 44, Patchset 5:func TestWriterMemUsage(t *testing.T) {

Sean Liao . unresolved

It doesn't look like this a test that can fail, should it be a benchmark instead?

Klaus Post

It is not exactly, but allocs seem close enough to static usage. Changing it to a benchmark.

Line 156, Patchset 5: continue

Sean Liao . resolved

break?

Klaus Post

Acknowledged

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Oct 17, 2025, 11:12:11 AM10/17/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #8 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 17, 2025, 11:33:22 AM10/17/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds added 1 comment

File src/compress/flate/huffman_sortByFreq.go

Line 7, Patchset 5:// Sort sorts data.

Sean Liao . unresolved

i suppose there's a benchmark somewhere that shows it's faster than slices.SortFunc ?
same for sortByLiteral

Klaus Post

I haven't tested it in a very long while. I had very significant improvement when added... https://github.com/klauspost/compress/pull/207#issuecomment-575072889 (numbers are for the full encode impact)
But that was replacing sort.Sort.

t hepudds

That seems like it was from ~2020, which sounds like it would be before pdqsort was implemented in the Go stdlib (I think Go 1.19 / 2022: https://go.dev/doc/go1.19#sortpkgsort)?

If so, might be worth at least trying a simple experiment to switch back to a stdlib version (to help minimize code to review/maintain/etc.)?

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 17, 2025, 11:44:55 AM10/17/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, t hepudds, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Klaus Post added 1 comment

File src/compress/flate/huffman_sortByFreq.go

Line 7, Patchset 5:// Sort sorts data.

Sean Liao . unresolved

i suppose there's a benchmark somewhere that shows it's faster than slices.SortFunc ?
same for sortByLiteral

Klaus Post

I haven't tested it in a very long while. I had very significant improvement when added... https://github.com/klauspost/compress/pull/207#issuecomment-575072889 (numbers are for the full encode impact)
But that was replacing sort.Sort.

t hepudds

That seems like it was from ~2020, which sounds like it would be before pdqsort was implemented in the Go stdlib (I think Go 1.19 / 2022: https://go.dev/doc/go1.19#sortpkgsort)?
If so, might be worth at least trying a simple experiment to switch back to a stdlib version (to help minimize code to review/maintain/etc.)?

Klaus Post

Sounds very reasonable. I will give it a shot and report back.

unsatisfied_requirement

open

diffy

Sean Liao (Gerrit)

unread,

Oct 17, 2025, 12:03:46 PM10/17/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, t hepudds, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Klaus Post

Sean Liao added 3 comments

File src/compress/flate/huffman_code.go

Line 375, Patchset 5: if len(b) >= 8<<10 {

Sean Liao . resolved

1 << 13 might be easier read

Klaus Post

I try to follow the x << 10 (for clear KiB), etc, whenever reasonable.

Sean Liao

Acknowledged

Line 397, Patchset 5: y, z, w = y[:len(x)], z[:len(x)], w[:len(x)]

Sean Liao . resolved

why not do the complete slice in one go? b[n:n*2], b[n*2:n*3], etc

Klaus Post

To make it clear to the compiler that 'len(y, z, w) == len(x)', so there are no bounds checks. Last time the compiler couldn't infer it.

Sean Liao

Acknowledged

File src/compress/flate/writer_test.go

Line 44, Patchset 5:func TestWriterMemUsage(t *testing.T) {

Sean Liao . resolved

It doesn't look like this a test that can fail, should it be a benchmark instead?

Klaus Post

It is not exactly, but allocs seem close enough to static usage. Changing it to a benchmark.

Sean Liao

Acknowledged

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Klaus Post

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Oct 20, 2025, 8:36:22 AM10/20/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Klaus Post and Sean Liao

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #9 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Klaus Post

Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 20, 2025, 8:37:54 AM10/20/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, t hepudds, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Klaus Post added 2 comments

Patchset-level comments

File-level comment, Patchset 1:

Gopher Robot . resolved

I spotted some possible problems with your PR:
  1. You have a long 82 character line in the commit message body. Please add line breaks to long lines that should be wrapped. Lines in the commit message body should be wrapped at ~76 characters unless needed for things like URLs or tables. (Note: GitHub might render long lines as soft-wrapped, so double-check in the Gerrit commit message shown above.)
  2. It looks like you have a properly formated bug reference, but the convention is to put bug references at the bottom of the commit message, even if a bug is also mentioned in the body of the message.
Please address any problems by updating the GitHub PR.
When complete, mark this comment as 'Done' and click the [blue 'Reply' button](https://go.dev/wiki/GerritBot#i-left-a-reply-to-a-comment-in-gerrit-but-no-one-but-me-can-see-it) above. These findings are based on heuristics; if a finding does not apply, briefly reply here saying so.
To update the commit title or commit message body shown here in Gerrit, you must edit the GitHub PR title and PR description (the first comment) in the GitHub web interface using the 'Edit' button or 'Edit' menu entry there. Note: pushing a new commit to the PR will not automatically update the commit message used by Gerrit.
For more details, see:
[how to update commit messages](https://go.dev/wiki/GerritBot/#how-does-gerritbot-determine-the-final-commit-message) for PRs imported into Gerrit.
the Go project's [conventions for commit messages](https://go.dev/doc/contribute#commit_messages) that you should follow.
(In general for Gerrit code reviews, the change author is expected to [log in to Gerrit](https://go-review.googlesource.com/login/) with a Gmail or other Google account and then close out each piece of feedback by marking it as 'Done' if implemented as suggested or otherwise reply to each review comment. See the [Review](https://go.dev/doc/contribute#review) section of the Contributing Guide for details.)

Klaus Post

Done

File src/compress/flate/huffman_sortByFreq.go

Line 7, Patchset 5:// Sort sorts data.

Sean Liao . unresolved

i suppose there's a benchmark somewhere that shows it's faster than slices.SortFunc ?
same for sortByLiteral

Klaus Post

I haven't tested it in a very long while. I had very significant improvement when added... https://github.com/klauspost/compress/pull/207#issuecomment-575072889 (numbers are for the full encode impact)
But that was replacing sort.Sort.

t hepudds

That seems like it was from ~2020, which sounds like it would be before pdqsort was implemented in the Go stdlib (I think Go 1.19 / 2022: https://go.dev/doc/go1.19#sortpkgsort)?
If so, might be worth at least trying a simple experiment to switch back to a stdlib version (to help minimize code to review/maintain/etc.)?

Klaus Post

Sounds very reasonable. I will give it a shot and report back.

Klaus Post

I tested sort.Slices - which was clearly slower - and allocated.

Using slices.SortFunc as you proposed the difference was minimal - somewhere in the 97 to 99% speed on the fastest settings when eliminating the branch.

That seems good enough to not warrant the extra code, so I removed it. Output remains the same.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 20, 2025, 9:08:11 AM10/20/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds added 1 comment

File src/cmd/compile/internal/test/inl_test.go

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 20, 2025, 9:10:10 AM10/20/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Klaus Post and Sean Liao

t hepudds added 1 comment

File src/compress/flate/huffman_sortByFreq.go

Line 7, Patchset 5:// Sort sorts data.

Sean Liao . resolved

i suppose there's a benchmark somewhere that shows it's faster than slices.SortFunc ?
same for sortByLiteral

Klaus Post

I haven't tested it in a very long while. I had very significant improvement when added... https://github.com/klauspost/compress/pull/207#issuecomment-575072889 (numbers are for the full encode impact)
But that was replacing sort.Sort.

t hepudds

That seems like it was from ~2020, which sounds like it would be before pdqsort was implemented in the Go stdlib (I think Go 1.19 / 2022: https://go.dev/doc/go1.19#sortpkgsort)?
If so, might be worth at least trying a simple experiment to switch back to a stdlib version (to help minimize code to review/maintain/etc.)?

Klaus Post

Sounds very reasonable. I will give it a shot and report back.

Klaus Post

I tested sort.Slices - which was clearly slower - and allocated.
Using slices.SortFunc as you proposed the difference was minimal - somewhere in the 97 to 99% speed on the fastest settings when eliminating the branch.
That seems good enough to not warrant the extra code, so I removed it. Output remains the same.

t hepudds

Thanks, that seems like a good change to make.

Also, some of the earlier trybot failures were due to some of the sort code, in addition to some nitpicky failures around exported identifiers and whatnot, so I am going to re-run the tests to see what the latest complaints are.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Klaus Post
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 20, 2025, 9:10:16 AM10/20/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Klaus Post and Sean Liao

t hepudds voted Commit-Queue+1

Commit-Queue

+1

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 20, 2025, 10:34:12 AM10/20/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 9 (Latest):

Klaus Post . unresolved

I see there are some failures where compiler inlining tests are being run on specific functions:

https://github.com/klauspost/go/blob/a67977da5e26e0c328488fe05bdd200903e58e99/src/cmd/compile/internal/test/inl_test.go#L118-L120

and

https://github.com/klauspost/go/blob/a67977da5e26e0c328488fe05bdd200903e58e99/src/cmd/compile/internal/test/inl_test.go#L306-L310

If anyone has guidance on how to fix those, that would be greatly appreciated. My only real solution I have is to delete them.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 20, 2025, 11:11:43 AM10/20/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds added 1 comment

Patchset-level comments

File-level comment, Patchset 9 (Latest):

Klaus Post . unresolved

I see there are some failures where compiler inlining tests are being run on specific functions:
https://github.com/klauspost/go/blob/a67977da5e26e0c328488fe05bdd200903e58e99/src/cmd/compile/internal/test/inl_test.go#L118-L120
and
https://github.com/klauspost/go/blob/a67977da5e26e0c328488fe05bdd200903e58e99/src/cmd/compile/internal/test/inl_test.go#L306-L310
If anyone has guidance on how to fix those, that would be greatly appreciated. My only real solution I have is to delete them.

t hepudds

Hi Klaus, see my comment here from a couple of hours ago. I believe you can delete the inlining tests in TestIntendedInlining that are no longer applicable if you have deleted the corresponding code in the compress packages.

(I believe those are not tests of the inliner itself, but rather they capture assertions that specific functions continue to get inlined as different people refactor the stdlib and runtime. This is similar in spirit to what one can do externally with things like https://github.com/jordanlewis/gcassert, where someone might want to "lock in" that some performance-critical function gets inlined and see a test fail if for example someone else later adds too much code to some function).

That's my general understanding at least -- I did not really dig into your specific examples.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Oct 20, 2025, 11:19:15 AM10/20/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #10 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 20, 2025, 11:22:52 AM10/20/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 20, 2025, 11:43:10 AM10/20/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 9:

Klaus Post . resolved

I see there are some failures where compiler inlining tests are being run on specific functions:
https://github.com/klauspost/go/blob/a67977da5e26e0c328488fe05bdd200903e58e99/src/cmd/compile/internal/test/inl_test.go#L118-L120
and
https://github.com/klauspost/go/blob/a67977da5e26e0c328488fe05bdd200903e58e99/src/cmd/compile/internal/test/inl_test.go#L306-L310
If anyone has guidance on how to fix those, that would be greatly appreciated. My only real solution I have is to delete them.

t hepudds

Hi Klaus, see my comment here from a couple of hours ago. I believe you can delete the inlining tests in TestIntendedInlining that are no longer applicable if you have deleted the corresponding code in the compress packages.
(I believe those are not tests of the inliner itself, but rather they capture assertions that specific functions continue to get inlined as different people refactor the stdlib and runtime. This is similar in spirit to what one can do externally with things like https://github.com/jordanlewis/gcassert, where someone might want to "lock in" that some performance-critical function gets inlined and see a test fail if for example someone else later adds too much code to some function).
That's my general understanding at least -- I did not really dig into your specific examples.

Klaus Post

Acknowledged

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 20, 2025, 11:45:00 AM10/20/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Klaus Post added 1 comment

File src/cmd/compile/internal/test/inl_test.go

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao
t hepudds

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 21, 2025, 11:02:25 AM10/21/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds added 1 comment

Patchset-level comments

File-level comment, Patchset 10 (Latest):

t hepudds . unresolved

Hi @klau...@gmail.com, you might want to try rebasing to a more recent commit, which might address the x/tools failure. (I don't know for sure it will help -- rather, it's just that in general x/tools can get out of sync with a CL if the CL's parent commit is too old, which rebasing often addresses).

That might also help with the DWARF failure, though I guess we'll see -- maybe it's a flake, or maybe some non-zero chance a DWARF failure could be related to a compress change, but that's with zero investigation on my part.😊

Just a quick FYI that if you do rebase, probably don't rebase to latest master because latest master might be having some problems.

You probably could rebase on commit 41f5659, which is a few commits back and might be before the problems. (That commit worked elsewhere).

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Sean Liao (Gerrit)

unread,

Oct 22, 2025, 7:40:10 PM10/22/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee and Jorropo

Sean Liao voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 23, 2025, 5:33:10 AM10/23/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 11 (Latest):

Klaus Post . unresolved

I have narrowed it down to zlib usage in cmd/link/internal/ld/data.go

Here in particular: https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L3213-L3245

It seems extremely fragile. Changing the final check to `if true || int64(buf.Len()) >= total {` (always fail size check) will make it fail as well (on tip).

 It seems like `compressSyms` returning nil isn't handled correctly by the caller and by chance it isn't hit currently. Removing the size check fixes it on windows/386 for me.

 Let me know which direction you'd like to take the fix?

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 23, 2025, 8:55:41 AM10/23/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds added 1 comment

Patchset-level comments

File-level comment, Patchset 11 (Latest):

Klaus Post . unresolved

I have narrowed it down to zlib usage in cmd/link/internal/ld/data.go
Here in particular: https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L3213-L3245
It seems extremely fragile. Changing the final check to `if true || int64(buf.Len()) >= total {` (always fail size check) will make it fail as well (on tip).
 It seems like `compressSyms` returning nil isn't handled correctly by the caller and by chance it isn't hit currently. Removing the size check fixes it on windows/386 for me.
 Let me know which direction you'd like to take the fix?

t hepudds

Hi Klaus, in the interests of minimizing load on you, one approach could be:

1. you file a very short bug report against the linker not handling compressSyms returning nil in the case of the size check. That most likely will result in someone-who-is-not-you working out what the proper fix is, how to test it, and so on.
2. you do a temporary and cheap (to you) workaround in this CL. I'm not sure what that might be, but perhaps temporarily change it to `zlib.NewWriterLevel(&buf, zlib.DefaultCompression)` or `zlib.NewWriterLevel(&buf, 2)` or whatever seems like a simple workaround to you. This gets the tests passing here in this CL.
3. you leave a TODO there to revert your temporary workaround once the linker bug is fixed, with the intent of reverting the workaround prior to this CL being merged.

Making up a time frame, the linker bug might be merged a week from now, or whenever it happens, my guess would be someone could land a fix almost certainly faster than the general review & tweaking here in this CL.

In any event, that's just one possible approach, and you could pick a different path. (And sorry if that doesn't make sense as an approach -- I did not really dig in here).

Separately, my guess is this example will probably trigger some discussion of whether other code might similarly not be robust against changes in the size of output, maybe especially for BestSpeed. Given that conversation is probably going to happen, I will post briefly on the main #75532 issue, which is probably a better place to have that conversation than Gerrit.

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 23, 2025, 9:28:27 AM10/23/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 11 (Latest):

Klaus Post . unresolved

I have narrowed it down to zlib usage in cmd/link/internal/ld/data.go
Here in particular: https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L3213-L3245
It seems extremely fragile. Changing the final check to `if true || int64(buf.Len()) >= total {` (always fail size check) will make it fail as well (on tip).
 It seems like `compressSyms` returning nil isn't handled correctly by the caller and by chance it isn't hit currently. Removing the size check fixes it on windows/386 for me.
 Let me know which direction you'd like to take the fix?

t hepudds

Hi Klaus, in the interests of minimizing load on you, one approach could be:
1. you file a very short bug report against the linker not handling compressSyms returning nil in the case of the size check. That most likely will result in someone-who-is-not-you working out what the proper fix is, how to test it, and so on.
2. you do a temporary and cheap (to you) workaround in this CL. I'm not sure what that might be, but perhaps temporarily change it to `zlib.NewWriterLevel(&buf, zlib.DefaultCompression)` or `zlib.NewWriterLevel(&buf, 2)` or whatever seems like a simple workaround to you. This gets the tests passing here in this CL.
3. you leave a TODO there to revert your temporary workaround once the linker bug is fixed, with the intent of reverting the workaround prior to this CL being merged.
Making up a time frame, the linker bug might be merged a week from now, or whenever it happens, my guess would be someone could land a fix almost certainly faster than the general review & tweaking here in this CL.
In any event, that's just one possible approach, and you could pick a different path. (And sorry if that doesn't make sense as an approach -- I did not really dig in here).
Separately, my guess is this example will probably trigger some discussion of whether other code might similarly not be robust against changes in the size of output, maybe especially for BestSpeed. Given that conversation is probably going to happen, I will post briefly on the main #75532 issue, which is probably a better place to have that conversation than Gerrit.

Klaus Post

Sounds good! I will disable the check for now and add a TODO linking to the issue when I've added it.

In many cases levels 5-6 is around the same speed as current level 1. See last 3 benchmarks on https://stdeflate.klauspost.com/ - they deal with smaller data sizes. So it would very likely be similar speed.

So there is the option to keep level 1 and take the extra speed or use level 5-6 and take the smaller output. But let's take that elsewhere.

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Oct 23, 2025, 9:56:44 AM10/23/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #12 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 23, 2025, 10:07:04 AM10/23/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Klaus Post added 2 comments

Patchset-level comments

File-level comment, Patchset 10:

t hepudds . resolved

Hi @klau...@gmail.com, you might want to try rebasing to a more recent commit, which might address the x/tools failure. (I don't know for sure it will help -- rather, it's just that in general x/tools can get out of sync with a CL if the CL's parent commit is too old, which rebasing often addresses).
That might also help with the DWARF failure, though I guess we'll see -- maybe it's a flake, or maybe some non-zero chance a DWARF failure could be related to a compress change, but that's with zero investigation on my part.😊
Just a quick FYI that if you do rebase, probably don't rebase to latest master because latest master might be having some problems.
You probably could rebase on commit 41f5659, which is a few commits back and might be before the problems. (That commit worked elsewhere).

Klaus Post

Acknowledged

File-level comment, Patchset 11:

Klaus Post . resolved

I have narrowed it down to zlib usage in cmd/link/internal/ld/data.go
Here in particular: https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L3213-L3245
It seems extremely fragile. Changing the final check to `if true || int64(buf.Len()) >= total {` (always fail size check) will make it fail as well (on tip).
 It seems like `compressSyms` returning nil isn't handled correctly by the caller and by chance it isn't hit currently. Removing the size check fixes it on windows/386 for me.
 Let me know which direction you'd like to take the fix?

t hepudds

Hi Klaus, in the interests of minimizing load on you, one approach could be:
1. you file a very short bug report against the linker not handling compressSyms returning nil in the case of the size check. That most likely will result in someone-who-is-not-you working out what the proper fix is, how to test it, and so on.
2. you do a temporary and cheap (to you) workaround in this CL. I'm not sure what that might be, but perhaps temporarily change it to `zlib.NewWriterLevel(&buf, zlib.DefaultCompression)` or `zlib.NewWriterLevel(&buf, 2)` or whatever seems like a simple workaround to you. This gets the tests passing here in this CL.
3. you leave a TODO there to revert your temporary workaround once the linker bug is fixed, with the intent of reverting the workaround prior to this CL being merged.
Making up a time frame, the linker bug might be merged a week from now, or whenever it happens, my guess would be someone could land a fix almost certainly faster than the general review & tweaking here in this CL.
In any event, that's just one possible approach, and you could pick a different path. (And sorry if that doesn't make sense as an approach -- I did not really dig in here).
Separately, my guess is this example will probably trigger some discussion of whether other code might similarly not be robust against changes in the size of output, maybe especially for BestSpeed. Given that conversation is probably going to happen, I will post briefly on the main #75532 issue, which is probably a better place to have that conversation than Gerrit.

Klaus Post

Sounds good! I will disable the check for now and add a TODO linking to the issue when I've added it.
In many cases levels 5-6 is around the same speed as current level 1. See last 3 benchmarks on https://stdeflate.klauspost.com/ - they deal with smaller data sizes. So it would very likely be similar speed.
So there is the option to keep level 1 and take the extra speed or use level 5-6 and take the smaller output. But let's take that elsewhere.

Klaus Post

Check is disabled the check for now - it will likely just be a few extra bytes - and it only triggered in windows/386 anyway. Added https://github.com/golang/go/issues/76022

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao
t hepudds

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 23, 2025, 10:19:27 AM10/23/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 23, 2025, 11:33:42 AM10/23/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 12 (Latest):

Klaus Post . unresolved

It seems like there are also some readers that assume that decompressed is always same or smaller is input:

https://github.com/klauspost/go/blob/1340662ab4c56f946337518bda94f861478d2661/src/debug/pe/file.go#L240

https://github.com/klauspost/go/blob/253dd08f5df3a45eafc97eec388636fcabfe0174/src/debug/macho/file.go#L649

I guess we will have to wait for the writer to be fixed.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

t hepudds

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 23, 2025, 1:46:59 PM10/23/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds added 1 comment

Patchset-level comments

File-level comment, Patchset 12 (Latest):

Klaus Post . unresolved

It seems like there are also some readers that assume that decompressed is always same or smaller is input:
https://github.com/klauspost/go/blob/1340662ab4c56f946337518bda94f861478d2661/src/debug/pe/file.go#L240
https://github.com/klauspost/go/blob/253dd08f5df3a45eafc97eec388636fcabfe0174/src/debug/macho/file.go#L649
I guess we will have to wait for the writer to be fixed.

t hepudds

Hi Klaus, FWIW, I was able to get it to pass the trybots by taking your patchset 12 but applying the temporary workaround I had suggested earlier of just changing the compression level in `cmd/link/internal/ld` (and rolling back your other attempted workaround):

```
diff --git a/src/cmd/link/internal/ld/data.go b/src/cmd/link/internal/ld/data.go
index e0de97a0fc..bb063a1315 100644
--- a/src/cmd/link/internal/ld/data.go
+++ b/src/cmd/link/internal/ld/data.go
@@ -3210,7 +3210,8 @@ func compressSyms(ctxt *Link, syms []loader.Sym) []byte {
        // compression levels of zlib.DefaultCompression, but takes
        // substantially less time. This is important because DWARF
        // compression can be a significant fraction of link time.
-       z, err := zlib.NewWriterLevel(&buf, zlib.BestSpeed)
+       // TODO: switch back to zlib.BestSpeed when https://github.com/golang/go/issues/76022 is resolved.
+       z, err := zlib.NewWriterLevel(&buf, zlib.BestCompression)
        if err != nil {
                log.Fatalf("NewWriterLevel failed: %s", err)
        }
@@ -3243,8 +3244,7 @@ func compressSyms(ctxt *Link, syms []loader.Sym) []byte {
        if err := z.Close(); err != nil {
                log.Fatalf("compression failed: %s", err)
        }
-       // TODO: Re-enable check when https://github.com/golang/go/issues/76022 is resolved.
-       if false && int64(buf.Len()) >= total {
+       if int64(buf.Len()) >= total {
                // Compression didn't save any space.
                return nil
        }

```

I also rebased to latest master (839da71f8907), though don't know if that was needed.

(My attempt did pass the windows trybots that were failing earlier here, but one hopefully minor caveat is that my attempt did have one trybot failure on gotip-linux-arm64 for `runtime.TestGdbBacktrace`, but I think that's a somewhat common flake, perhaps #58932 or similar, and hopefully that is unrelated to any of your changes).

So you could try that, or you could alternatively wait to see what happens with #76022. (Finally, I hope I did not make some silly mistake in my quick test).

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Oct 28, 2025, 9:44:05 AM10/28/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #13 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 28, 2025, 10:32:10 AM10/28/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Klaus Post voted and added 1 comment

Votes added by Klaus Post

Commit-Queue

+1

1 comment

Patchset-level comments

File-level comment, Patchset 12:

Klaus Post . resolved

Klaus Post

Yes, after CL 714461 it can now pass the tests with the workaround. I have merged master.

I would prefer to keep the TODO, since we probably want to keep "Fastest" as the option, and until the we can write the uncompressed safely, I think we should keep it, so there isn't a chance that there will be random errors depending on input (unless I am misunderstanding something in the linker issue).

In other words, this would be "acceptable" to release, and I'd prefer that. We only really risk that binaries are a few bytes bigger than they need to be.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao
t hepudds

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 28, 2025, 10:32:46 AM10/28/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Klaus Post voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 28, 2025, 10:42:15 AM10/28/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Klaus Post and Sean Liao

t hepudds added 1 comment

Patchset-level comments

File-level comment, Patchset 13 (Latest):

t hepudds . unresolved

Hi Klaus, I think master might be having problems, which likely explains several of the trybot failures here. (See for example https://build.golang.org/).

You could wait a bit for that to be resolved (I would guess later today), or you could go backwards to rebase onto a slightly earlier commit on master, such as 5dcaf9a (which I believe is a good one from a day or so ago).

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Klaus Post
Sean Liao

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 28, 2025, 10:45:33 AM10/28/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 13 (Latest):

t hepudds . unresolved

Hi Klaus, I think master might be having problems, which likely explains several of the trybot failures here. (See for example https://build.golang.org/).
You could wait a bit for that to be resolved (I would guess later today), or you could go backwards to rebase onto a slightly earlier commit on master, such as 5dcaf9a (which I believe is a good one from a day or so ago).

Klaus Post

No problem. Sorry for fumbling a bit. Not really into all these tools :)

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Oct 28, 2025, 7:14:14 PM10/28/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds added 1 comment

Patchset-level comments

File-level comment, Patchset 13 (Latest):

t hepudds . unresolved

Hi Klaus, I think master might be having problems, which likely explains several of the trybot failures here. (See for example https://build.golang.org/).
You could wait a bit for that to be resolved (I would guess later today), or you could go backwards to rebase onto a slightly earlier commit on master, such as 5dcaf9a (which I believe is a good one from a day or so ago).

Klaus Post

No problem. Sorry for fumbling a bit. Not really into all these tools :)

t hepudds

Hi Klaus, FWIW, I think master is likely back to better health at this point if you want to try rebasing whenever convenient.

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Oct 29, 2025, 6:09:20 AM10/29/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #14 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Oct 29, 2025, 6:11:34 AM10/29/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Sean Liao and t hepudds

Klaus Post voted and added 1 comment

Votes added by Klaus Post

Commit-Queue

+1

1 comment

Patchset-level comments

File-level comment, Patchset 13:

t hepudds . resolved

Hi Klaus, I think master might be having problems, which likely explains several of the trybot failures here. (See for example https://build.golang.org/).
You could wait a bit for that to be resolved (I would guess later today), or you could go backwards to rebase onto a slightly earlier commit on master, such as 5dcaf9a (which I believe is a good one from a day or so ago).

Klaus Post

No problem. Sorry for fumbling a bit. Not really into all these tools :)

t hepudds

Hi Klaus, FWIW, I think master is likely back to better health at this point if you want to try rebasing whenever convenient.

Klaus Post

Acknowledged

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao
t hepudds

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Ivan Tse (Gerrit)

unread,

Oct 29, 2025, 1:04:46 PM10/29/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Klaus Post, Sean Liao and t hepudds

Ivan Tse voted Code-Review+1

Code-Review

+1

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Sean Liao (Gerrit)

unread,

Nov 2, 2025, 8:00:01 AM11/2/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Klaus Post

Sean Liao voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Klaus Post

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Nov 14, 2025, 10:06:00 AM11/14/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 15 (Latest):

Klaus Post . resolved

I feel a bit deadlocked here. TestGdbPython also appears to have an issue with reading the sections it expects. The option to set the value as uncompressed (returning nil) feels like the wrong solution since it could still randomly trigger on incompressible output.

I don't feel that changing the compression to a higher level. We can't release with that, since it will be a speed degradation.

Suggestions are welcome.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

t hepudds (Gerrit)

unread,

Nov 14, 2025, 10:23:26 AM11/14/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

t hepudds added 1 comment

Patchset-level comments

File-level comment, Patchset 15 (Latest):

Klaus Post . unresolved

I feel a bit deadlocked here. TestGdbPython also appears to have an issue with reading the sections it expects. The option to set the value as uncompressed (returning nil) feels like the wrong solution since it could still randomly trigger on incompressible output.
I don't feel that changing the compression to a higher level. We can't release with that, since it will be a speed degradation.
Suggestions are welcome.

t hepudds

Hi Klaus, my suggestion would be as a _temporary_ measure, change the compression in the linker to a higher level -- however high is necessarily to _temporarily_ get the tests passing.

(I was able to get it to pass the trybots with `zlib.NewWriterLevel(&buf, zlib.BestCompression)`, but maybe I did something wrong; see a few comments back from me).

Getting the tests to pass on the trybots might have some benefits:
1. It allows the linker work (e.g., #76022) vs. compression work to proceed more independently.
2. Seeing green trybots is an indication to various reviewers and the core team that this work is generally going well. (Some reviewers on the core team say they generally don't want to review a CL with failing trybots, because it can be less efficient for them).
3. There is some chance some other problem appears after the trybots are running more, which might then be something more directly related to the compression work, which then can be investigated completely separately from any linker issues.

If `TestGdbPython` still fails even with the highest compression, then a temporary `Skip` could be added to that test, again with the goal of seeing the trybots complete end-to-end and get to green (unless some other currently unknown problem occurs first, which will be useful data), and the `Skip` can be removed before merging this CL once the linker (or whatever else) is fixed.

Finally, I did not look at the `TestGdbPython` failures at all, so maybe what I've said here does not make sense, but I guess my general advice would be to use temporary measures if needed to get to green trybots (unless it seems to be a large or never ending stream of other issues, in which case some other strategy might be needed).

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo
Sean Liao

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Ian Lance Taylor (Gerrit)

unread,

Nov 14, 2025, 1:40:36 PM11/14/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Ian Lance Taylor added 1 comment

Patchset-level comments

File-level comment, Patchset 15 (Latest):

Klaus Post . unresolved

I feel a bit deadlocked here. TestGdbPython also appears to have an issue with reading the sections it expects. The option to set the value as uncompressed (returning nil) feels like the wrong solution since it could still randomly trigger on incompressible output.
I don't feel that changing the compression to a higher level. We can't release with that, since it will be a speed degradation.
Suggestions are welcome.

t hepudds

Hi Klaus, my suggestion would be as a _temporary_ measure, change the compression in the linker to a higher level -- however high is necessarily to _temporarily_ get the tests passing.
(I was able to get it to pass the trybots with `zlib.NewWriterLevel(&buf, zlib.BestCompression)`, but maybe I did something wrong; see a few comments back from me).
Getting the tests to pass on the trybots might have some benefits:
1. It allows the linker work (e.g., #76022) vs. compression work to proceed more independently.
2. Seeing green trybots is an indication to various reviewers and the core team that this work is generally going well. (Some reviewers on the core team say they generally don't want to review a CL with failing trybots, because it can be less efficient for them).
3. There is some chance some other problem appears after the trybots are running more, which might then be something more directly related to the compression work, which then can be investigated completely separately from any linker issues.
If `TestGdbPython` still fails even with the highest compression, then a temporary `Skip` could be added to that test, again with the goal of seeing the trybots complete end-to-end and get to green (unless some other currently unknown problem occurs first, which will be useful data), and the `Skip` can be removed before merging this CL once the linker (or whatever else) is fixed.
Finally, I did not look at the `TestGdbPython` failures at all, so maybe what I've said here does not make sense, but I guess my general advice would be to use temporary measures if needed to get to green trybots (unless it seems to be a large or never ending stream of other issues, in which case some other strategy might be needed).

Ian Lance Taylor

Looks to me like we should modify the test to check for the error "unable to get decompressed section .debug_gdb_scripts" and to skip the test if we see that error message. See also this recent patch to gdb: https://patchwork.sourceware.org/project/gdb/patch/20251010092049.39...@mbosch.me/

It is also possible that since this seems to be such a recent change to gdb, we should deliberately avoid compressing the .debug_gdb_scripts section for now.

unsatisfied_requirement

open

diffy

Cherry Mui (Gerrit)

unread,

Nov 17, 2025, 5:54:59 PM11/17/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Cherry Mui added 1 comment

Patchset-level comments

File-level comment, Patchset 11:

Klaus Post . unresolved

I have narrowed it down to zlib usage in cmd/link/internal/ld/data.go

Here in particular: https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L3213-L3245
It seems extremely fragile. Changing the final check to `if true || int64(buf.Len()) >= total {` (always fail size check) will make it fail as well (on tip).
 It seems like `compressSyms` returning nil isn't handled correctly by the caller and by chance it isn't hit currently. Removing the size check fixes it on windows/386 for me.
 Let me know which direction you'd like to take the fix?

t hepudds

Hi Klaus, in the interests of minimizing load on you, one approach could be:
1. you file a very short bug report against the linker not handling compressSyms returning nil in the case of the size check. That most likely will result in someone-who-is-not-you working out what the proper fix is, how to test it, and so on.
2. you do a temporary and cheap (to you) workaround in this CL. I'm not sure what that might be, but perhaps temporarily change it to `zlib.NewWriterLevel(&buf, zlib.DefaultCompression)` or `zlib.NewWriterLevel(&buf, 2)` or whatever seems like a simple workaround to you. This gets the tests passing here in this CL.
3. you leave a TODO there to revert your temporary workaround once the linker bug is fixed, with the intent of reverting the workaround prior to this CL being merged.
Making up a time frame, the linker bug might be merged a week from now, or whenever it happens, my guess would be someone could land a fix almost certainly faster than the general review & tweaking here in this CL.
In any event, that's just one possible approach, and you could pick a different path. (And sorry if that doesn't make sense as an approach -- I did not really dig in here).
Separately, my guess is this example will probably trigger some discussion of whether other code might similarly not be robust against changes in the size of output, maybe especially for BestSpeed. Given that conversation is probably going to happen, I will post briefly on the main #75532 issue, which is probably a better place to have that conversation than Gerrit.

Klaus Post

Sounds good! I will disable the check for now and add a TODO linking to the issue when I've added it.
In many cases levels 5-6 is around the same speed as current level 1. See last 3 benchmarks on https://stdeflate.klauspost.com/ - they deal with smaller data sizes. So it would very likely be similar speed.
So there is the option to keep level 1 and take the extra speed or use level 5-6 and take the smaller output. But let's take that elsewhere.

Klaus Post

Check is disabled the check for now - it will likely just be a few extra bytes - and it only triggered in windows/386 anyway. Added https://github.com/golang/go/issues/76022

Cherry Mui

Thanks for working on this!

With CL 721340, #76022 should be fixed.

That said, the linker should choose a compression level that is beneficial in most cases. The comment there mentions that BestSpeed achieves the similar compression as the default (in the old code). If this is no longer true, we should change the level. (But that makes me wonder, even those the levels are sort of arbitrary, if BestSpeed does not result in a reduced size in common cases, it seems wrong.)

Disabling the check is also not what we want to do in the linker. If the compression doesn't help, we shouldn't apply it.

unsatisfied_requirement

open

diffy

Cherry Mui (Gerrit)

unread,

Nov 17, 2025, 6:10:38 PM11/17/25

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Ian Lance Taylor, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo and Sean Liao

Cherry Mui added 1 comment

Patchset-level comments

File-level comment, Patchset 15 (Latest):

Klaus Post . unresolved

I feel a bit deadlocked here. TestGdbPython also appears to have an issue with reading the sections it expects. The option to set the value as uncompressed (returning nil) feels like the wrong solution since it could still randomly trigger on incompressible output.

I don't feel that changing the compression to a higher level. We can't release with that, since it will be a speed degradation.
Suggestions are welcome.

t hepudds

Hi Klaus, my suggestion would be as a _temporary_ measure, change the compression in the linker to a higher level -- however high is necessarily to _temporarily_ get the tests passing.
(I was able to get it to pass the trybots with `zlib.NewWriterLevel(&buf, zlib.BestCompression)`, but maybe I did something wrong; see a few comments back from me).
Getting the tests to pass on the trybots might have some benefits:
1. It allows the linker work (e.g., #76022) vs. compression work to proceed more independently.
2. Seeing green trybots is an indication to various reviewers and the core team that this work is generally going well. (Some reviewers on the core team say they generally don't want to review a CL with failing trybots, because it can be less efficient for them).
3. There is some chance some other problem appears after the trybots are running more, which might then be something more directly related to the compression work, which then can be investigated completely separately from any linker issues.
If `TestGdbPython` still fails even with the highest compression, then a temporary `Skip` could be added to that test, again with the goal of seeing the trybots complete end-to-end and get to green (unless some other currently unknown problem occurs first, which will be useful data), and the `Skip` can be removed before merging this CL once the linker (or whatever else) is fixed.
Finally, I did not look at the `TestGdbPython` failures at all, so maybe what I've said here does not make sense, but I guess my general advice would be to use temporary measures if needed to get to green trybots (unless it seems to be a large or never ending stream of other issues, in which case some other strategy might be needed).

Ian Lance Taylor

Looks to me like we should modify the test to check for the error "unable to get decompressed section .debug_gdb_scripts" and to skip the test if we see that error message. See also this recent patch to gdb: https://patchwork.sourceware.org/project/gdb/patch/20251010092049.39...@mbosch.me/
It is also possible that since this seems to be such a recent change to gdb, we should deliberately avoid compressing the .debug_gdb_scripts section for now.

Cherry Mui

It should be fine to deliberately avoid compressing the .debug_gdb_scripts section. In fact, currently the section is usually not actually compressed. The section is usually very small, and adding the compression header may make the size longer, therefore in the usual case it is not actually compressed.

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Nov 28, 2025, 9:24:07 AM11/28/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #16 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Jorropo

Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Nov 28, 2025, 9:42:23 AM11/28/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Cherry Mui, Ian Lance Taylor, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Sean Liao and t hepudds

Klaus Post added 3 comments

Patchset-level comments

File-level comment, Patchset 11:

Klaus Post . resolved

I have narrowed it down to zlib usage in cmd/link/internal/ld/data.go

Here in particular: https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/data.go#L3213-L3245
It seems extremely fragile. Changing the final check to `if true || int64(buf.Len()) >= total {` (always fail size check) will make it fail as well (on tip).
 It seems like `compressSyms` returning nil isn't handled correctly by the caller and by chance it isn't hit currently. Removing the size check fixes it on windows/386 for me.
 Let me know which direction you'd like to take the fix?

t hepudds

Hi Klaus, in the interests of minimizing load on you, one approach could be:
1. you file a very short bug report against the linker not handling compressSyms returning nil in the case of the size check. That most likely will result in someone-who-is-not-you working out what the proper fix is, how to test it, and so on.
2. you do a temporary and cheap (to you) workaround in this CL. I'm not sure what that might be, but perhaps temporarily change it to `zlib.NewWriterLevel(&buf, zlib.DefaultCompression)` or `zlib.NewWriterLevel(&buf, 2)` or whatever seems like a simple workaround to you. This gets the tests passing here in this CL.
3. you leave a TODO there to revert your temporary workaround once the linker bug is fixed, with the intent of reverting the workaround prior to this CL being merged.
Making up a time frame, the linker bug might be merged a week from now, or whenever it happens, my guess would be someone could land a fix almost certainly faster than the general review & tweaking here in this CL.
In any event, that's just one possible approach, and you could pick a different path. (And sorry if that doesn't make sense as an approach -- I did not really dig in here).
Separately, my guess is this example will probably trigger some discussion of whether other code might similarly not be robust against changes in the size of output, maybe especially for BestSpeed. Given that conversation is probably going to happen, I will post briefly on the main #75532 issue, which is probably a better place to have that conversation than Gerrit.

Klaus Post

Sounds good! I will disable the check for now and add a TODO linking to the issue when I've added it.
In many cases levels 5-6 is around the same speed as current level 1. See last 3 benchmarks on https://stdeflate.klauspost.com/ - they deal with smaller data sizes. So it would very likely be similar speed.
So there is the option to keep level 1 and take the extra speed or use level 5-6 and take the smaller output. But let's take that elsewhere.

Klaus Post

Check is disabled the check for now - it will likely just be a few extra bytes - and it only triggered in windows/386 anyway. Added https://github.com/golang/go/issues/76022

Cherry Mui

Thanks for working on this!
With CL 721340, #76022 should be fixed.
That said, the linker should choose a compression level that is beneficial in most cases. The comment there mentions that BestSpeed achieves the similar compression as the default (in the old code). If this is no longer true, we should change the level. (But that makes me wonder, even those the levels are sort of arbitrary, if BestSpeed does not result in a reduced size in common cases, it seems wrong.)
Disabling the check is also not what we want to do in the linker. If the compression doesn't help, we shouldn't apply it.

Klaus Post

I merged master and removed the workaround. Tests are passing locally, but of course it isn't testing all platforms. This seems like the final implementation we want (fast and skip if no benefit).

File-level comment, Patchset 15:

Klaus Post . resolved

I feel a bit deadlocked here. TestGdbPython also appears to have an issue with reading the sections it expects. The option to set the value as uncompressed (returning nil) feels like the wrong solution since it could still randomly trigger on incompressible output.
I don't feel that changing the compression to a higher level. We can't release with that, since it will be a speed degradation.
Suggestions are welcome.

t hepudds

Hi Klaus, my suggestion would be as a _temporary_ measure, change the compression in the linker to a higher level -- however high is necessarily to _temporarily_ get the tests passing.
(I was able to get it to pass the trybots with `zlib.NewWriterLevel(&buf, zlib.BestCompression)`, but maybe I did something wrong; see a few comments back from me).
Getting the tests to pass on the trybots might have some benefits:
1. It allows the linker work (e.g., #76022) vs. compression work to proceed more independently.
2. Seeing green trybots is an indication to various reviewers and the core team that this work is generally going well. (Some reviewers on the core team say they generally don't want to review a CL with failing trybots, because it can be less efficient for them).
3. There is some chance some other problem appears after the trybots are running more, which might then be something more directly related to the compression work, which then can be investigated completely separately from any linker issues.
If `TestGdbPython` still fails even with the highest compression, then a temporary `Skip` could be added to that test, again with the goal of seeing the trybots complete end-to-end and get to green (unless some other currently unknown problem occurs first, which will be useful data), and the `Skip` can be removed before merging this CL once the linker (or whatever else) is fixed.
Finally, I did not look at the `TestGdbPython` failures at all, so maybe what I've said here does not make sense, but I guess my general advice would be to use temporary measures if needed to get to green trybots (unless it seems to be a large or never ending stream of other issues, in which case some other strategy might be needed).

Ian Lance Taylor

Looks to me like we should modify the test to check for the error "unable to get decompressed section .debug_gdb_scripts" and to skip the test if we see that error message. See also this recent patch to gdb: https://patchwork.sourceware.org/project/gdb/patch/20251010092049.39...@mbosch.me/
It is also possible that since this seems to be such a recent change to gdb, we should deliberately avoid compressing the .debug_gdb_scripts section for now.

Cherry Mui

It should be fine to deliberately avoid compressing the .debug_gdb_scripts section. In fact, currently the section is usually not actually compressed. The section is usually very small, and adding the compression header may make the size longer, therefore in the usual case it is not actually compressed.

Klaus Post

I've merged master, which AFAICT should resolve the issue.

File-level comment, Patchset 16 (Latest):

Klaus Post . resolved

Let's see how things behave.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo
Sean Liao
t hepudds

Submit Requirements:

Code-Review

No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Nov 28, 2025, 9:43:32 AM11/28/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Cherry Mui, Ian Lance Taylor, Go LUCI, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Sean Liao and t hepudds

Klaus Post voted and added 1 comment

Votes added by Klaus Post

Commit-Queue

+1

1 comment

Patchset-level comments

File-level comment, Patchset 16 (Latest):

Klaus Post . resolved

Sorry. Didn't realize votes weren't sticky.

unsatisfied_requirement

satisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Nov 30, 2025, 1:53:10 PM11/30/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #17 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result+1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Dec 1, 2025, 1:46:29 PM12/1/25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #18 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo
Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Dec 1, 2025, 1:51:49 PM12/1/25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Marek Siarkowicz, Go LUCI, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Sean Liao and t hepudds

Klaus Post voted and added 1 comment

Votes added by Klaus Post

Commit-Queue

+1

1 comment

Patchset-level comments

File-level comment, Patchset 18 (Latest):

Klaus Post . resolved

Added upstream issue https://github.com/klauspost/compress/pull/1115

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Sean Liao (Gerrit)

unread,

Feb 4, 2026, 5:47:58 PMFeb 4

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo and t hepudds

Sean Liao voted Commit-Queue+1

Commit-Queue

+1

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Sean Liao (Gerrit)

unread,

Feb 6, 2026, 2:56:02 PMFeb 6

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo and t hepudds

Sean Liao voted Code-Review+2

Code-Review

+2

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo
t hepudds

Submit Requirements:

Code-Review

No-Unresolved-Comments
Review-Enforcement

TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Cherry Mui (Gerrit)

unread,

Feb 6, 2026, 3:48:51 PMFeb 6

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Marek Siarkowicz, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Ian Lance Taylor, Jorropo and t hepudds

Cherry Mui added 9 comments

Patchset-level comments

File-level comment, Patchset 11:

Klaus Post . unresolved

I have narrowed it down to zlib usage in cmd/link/internal/ld/data.go

Cherry Mui

Thanks. How is the compression ratio in the linker now? Does "BestSpeed achieves the similar compression as the default" still apply? Is BestSpeed still a good choice there?

File-level comment, Patchset 19 (Latest):

Cherry Mui . resolved

Just some drive-by comments. Haven't reviewed the actual compression code (which I'm not an expert of anyway).

Thanks!

Commit Message

Line 64, Patchset 19 (Latest):Encode/Digits/Huffman/1e4-32 11.4µs ± 0% 8.0µs ± 0% ~ (p=1.000 n=1+1)

Cherry Mui . unresolved

Could you run the benchmarks more times, so benchstat can do a statistic analysis? Thanks.

File src/compress/flate/dict_decoder.go

Line 163, Patchset 19 (Latest): goto loop // Avoid for-loop so that this function can be inlined

Cherry Mui . unresolved

Is this still necessary? We now can inline function with for loops, and it shouldn't be too costly (in terms of inlining budget) compared to a goto loop.

File src/compress/flate/fuzz_test.go

Line 102, Patchset 19 (Latest):

			// Determinism checks will usually not be reproducible,
			// since it often relies on the internal state of the compressor.

Cherry Mui . unresolved

I'm not sure I follow what this comment means. It seems the code does check that compression is deterministic. The comment is saying when it fails, it may not easy to reproduce?

File src/compress/flate/level4.go

Line 11, Patchset 19 (Latest): bTable [tableSize]tableEntry

Cherry Mui . unresolved

The field is named bTable, but it doesn't use bTableSize. bTable probably has different meanings in the two context? Maybe use a different name.

File src/compress/flate/regmask_amd64.go

Line 13, Patchset 19 (Latest): reg8SizeMask64 = 63

Cherry Mui . unresolved

This is to signal the compiler the shift is in range therefore it can remove a branch? Perhaps mention that in the comment.

File src/compress/flate/unsafe_disabled.go

File-level comment, Patchset 19 (Latest):

Cherry Mui . unresolved

I'm not sure about the name of the file. It is usually the default with no unsafe code, so "unsafe disabled" doesn't need to be called out. Maybe rename.

Could add a comment mentioning that they stay in a separate file in case we add unsafe version later.

File src/debug/elf/file_test.go

Line 1563, Patchset 19 (Latest): // Insert zlib compressed sec.Data() block with `[]byte{1, 0, 0, 0}` as the first 4 bytes

Cherry Mui . unresolved

Is this change necessary? Is the old code stop working?

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee

Ian Lance Taylor
Jorropo
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments

Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Sean Liao (Gerrit)

unread,

Feb 6, 2026, 3:50:22 PMFeb 6

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Ian Lance Taylor, Jorropo and t hepudds

Sean Liao voted Code-Review+0

Code-Review

+0

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Ian Lance Taylor
Jorropo
t hepudds

Submit Requirements:

Code-Review

unsatisfied_requirement

satisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Feb 9, 2026, 6:56:37 AMFeb 9

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #20 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result+1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Ian Lance Taylor
Jorropo

Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement

TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Feb 9, 2026, 7:21:50 AMFeb 9

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Sean Liao and t hepudds

Klaus Post voted and added 9 comments

Votes added by Klaus Post

Commit-Queue

+1

9 comments

Patchset-level comments

File-level comment, Patchset 11:

Klaus Post . resolved

Klaus Post

It is a tradeoff. I'd would expect that speed would be preferred. When comparing .a files the speed drops from 428MiB/s -> 290MiB/s going from level 1 to 5, but size saving also goes from 81.5 -> 84.5%. It is a knob, but I feel people like you may have a better perspective. Therefore I've just left it.

This being the most "neutral" change makes sense to me. I'd leave it to others to find the value that makes sense the broadest - with time cost/size tradeoffs properly benchmarked.

File-level comment, Patchset 20 (Latest):

Klaus Post . resolved

Addressed issues, updated copyright on new files and merged master.

Commit Message

Line 64, Patchset 19:Encode/Digits/Huffman/1e4-32 11.4µs ± 0% 8.0µs ± 0% ~ (p=1.000 n=1+1)

Cherry Mui . resolved

Could you run the benchmarks more times, so benchstat can do a statistic analysis? Thanks.

Klaus Post

Acknowledged. Though these are merely a hint at performance. More data types are needed for the full picture. It may take a bit to get numbers on a 'quiet' system, so bear a bit with me while I work it out.

File src/compress/flate/dict_decoder.go

Line 163, Patchset 19: goto loop // Avoid for-loop so that this function can be inlined

Cherry Mui . resolved

Is this still necessary? We now can inline function with for loops, and it shouldn't be too costly (in terms of inlining budget) compared to a goto loop.

Klaus Post

Seeing it now, it is not too relevant for this PR. I will just revert this. There is a separate CL/PR for inflate anyway.

File src/compress/flate/fuzz_test.go

Line 102, Patchset 19: // Determinism checks will usually not be reproducible,


			// since it often relies on the internal state of the compressor.

Cherry Mui . resolved

I'm not sure I follow what this comment means. It seems the code does check that compression is deterministic. The comment is saying when it fails, it may not easy to reproduce?

Klaus Post

Yes. There is a "trick" used to avoid clearing the hash tables. This plus that the hash-tables are lossy means that for this case you have to be quite specific to detect non-determinism.

Long story:

The "trick" is that backreferences are marked with their offset. This offset has a "base" value. This is then subtracted to find the actual offset. Since we want to keep the hash table reasonably small, the offsets are 32 bits. This means that every 2GiB we reset the base value to zero and adjust the offsets.

When a "new" encode is started we simply increase the base value by the maximum allowed offset instead of clearing the table. We must always check the offset against the maximum offset before allowing any match, therefore previous content cannot affect current content.

This fuzz test checks that doing two encodes with similar parameters on the same data always produces the same output. However, it will not be able to feed more than 2GB data, so failures here are non-deterministic, since it relies on the data fed from previous runs.

This means that a failure with a certain hash will *not* reproduce the issue since the encoder will likely not have the internal state to trigger on a "clean run". It is however IMO better to have this trigger in case there is an issue, even if it will not reproduce. Consider it similar to the race detector in that regard.

File src/compress/flate/level4.go

Line 11, Patchset 19: bTable [tableSize]tableEntry

Cherry Mui . resolved

The field is named bTable, but it doesn't use bTableSize. bTable probably has different meanings in the two context? Maybe use a different name.

Klaus Post

Acknowledged

File src/compress/flate/regmask_amd64.go

Line 13, Patchset 19: reg8SizeMask64 = 63

Cherry Mui . resolved

This is to signal the compiler the shift is in range therefore it can remove a branch? Perhaps mention that in the comment.

Klaus Post

Correct. If this is applied to all platforms the AND is inserted into the code. So for these we use 255 as the AND value - in which case it is correctly eliminated. I will add a short note. Acknowledged

File src/compress/flate/unsafe_disabled.go

File-level comment, Patchset 19:

Cherry Mui . resolved

I'm not sure about the name of the file. It is usually the default with no unsafe code, so "unsafe disabled" doesn't need to be called out. Maybe rename.
Could add a comment mentioning that they stay in a separate file in case we add unsafe version later.

Klaus Post

Good idea. Renamed it `load_store.go` and added a comment. Marked as resolved.

File src/debug/elf/file_test.go

Line 1563, Patchset 19: // Insert zlib compressed sec.Data() block with `[]byte{1, 0, 0, 0}` as the first 4 bytes

Cherry Mui . unresolved

Is this change necessary? Is the old code stop working?

Klaus Post

The previous test relied on changing specific bytes in the *compressed* data, so it relied on a specific encoding to reproduce.

The only way I could see to make this encoding-agnostic was to insert the data with the error the bug tries to reproduce.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee

Cherry Mui
Ian Lance Taylor
Jorropo

Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Feb 9, 2026, 7:57:13 AMFeb 9

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Sean Liao and t hepudds

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 20 (Latest):

Klaus Post . resolved

Failure appears unrelated. Let me know if I should take some action...

--- FAIL: TestLSAN (51.97s)
--- FAIL: TestLSAN/lsan3 (3.77s)
    lsan_test.go:44: /home/swarming/.swarming/w/ir/x/t/TestLSANlsan34092314877/001/lsan3 exited with exit status 1
        
        =================================================================
        ==3621081==ERROR: LeakSanitizer: detected memory leaks
        
        Direct leak of 24 byte(s) in 1 object(s) allocated from:
            #0 0x7e6abeb4ae8f  (/usr/lib/x86_64-linux-gnu/libasan.so.6+0xa9e8f)
            #1 0x489981  (/home/swarming/.swarming/w/ir/x/t/TestLSANlsan34092314877/001/lsan3+0x489981)
            #2 0x47a723  (/home/swarming/.swarming/w/ir/x/t/TestLSANlsan34092314877/001/lsan3+0x47a723)
        
        SUMMARY: AddressSanitizer: 24 byte(s) leaked in 1 allocation(s).
FAIL
FAIL	cmd/cgo/internal/testsanitizers	62.077s

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Feb 25, 2026, 8:50:00 AMFeb 25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #21 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Feb 25, 2026, 9:09:43 AMFeb 25

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #22 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo
Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Feb 25, 2026, 9:09:55 AMFeb 25

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Sean Liao and t hepudds

Klaus Post voted and added 1 comment

Votes added by Klaus Post

Commit-Queue

+1

1 comment

Patchset-level comments

File-level comment, Patchset 21:

Klaus Post . resolved

Benchmarks updated. Merged master.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Russ Cox (Gerrit)

unread,

Feb 26, 2026, 5:24:08 PMFeb 26

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Russ Cox, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Russ Cox added 159 comments

Patchset-level comments

File-level comment, Patchset 22 (Latest):

Russ Cox . resolved

Thanks very much for working on this. Tons of comments, but overall I'm excited for the CL. The dashboard you made is especially nice.

File src/compress/flate/deflate.go

Line 30, Patchset 22 (Latest):

Russ Cox . unresolved

Please restore the closing ) here and opening const ( for the next block.
The purpose of those is to keep exported and unexported constants separate,
both for readability in source code and to keep godoc from saying something
about unexported things when you run, say, 'go doc flate.HuffmanOnly'.

Line 52, Patchset 22 (Latest):type compressionLevel struct {

Russ Cox . unresolved

Even unexported symbols - types, variables, methods, everything - need doc comments explaining what they are. Please add throughout. I realize that much of this code predates you, but you understand it best right now and are in the best position to write a sentence or two about each.

Line 53, Patchset 22 (Latest): good, lazy, nice, chain, level int

Russ Cox . unresolved

In general please put struct fields on their own lines.
And then comment what each one means.

Line 72, Patchset 22 (Latest):// advancedState contains state for the advanced levels, with bigger hash tables, etc.

Russ Cox . unresolved

s/the advanced levels/levels 7-9/ ?

Line 75, Patchset 22 (Latest): length int

Russ Cox . unresolved

It looks like a bunch of conversions would disappear if these were maintained as int32, and similarly for the hash arrays below.

Line 81, Patchset 22 (Latest): ii uint16 // position of last match, intended to overflow to reset.

Russ Cox . unresolved

There must be a better name than 'ii'?

Line 85, Patchset 22 (Latest): hashMatch [maxMatchLength + minMatchLength]uint32

Russ Cox . unresolved

Unless the window can be > 2GB, use int32 to avoid weird math around 0.

Line 92, Patchset 22 (Latest): hashHead [hashSize]uint32

Russ Cox . unresolved

Same.

Line 96, Patchset 22 (Latest):type compressor struct {

Russ Cox . unresolved

Doc comment. Also add a brief comment to each struct field.

Line 106, Patchset 22 (Latest): window []byte

Russ Cox . unresolved

// sliding window, of length 2*windowSize OR 32kB OR maxStoreBlockSize, depending on level

(This seems error-prone but is at least true.)

Line 120, Patchset 22 (Latest):func (d *compressor) fillDeflate(b []byte) int {

Russ Cox . unresolved

Doc comment. Mention this is only for levels 7-9.

Line 124, Patchset 22 (Latest): //copy(d.window[:], d.window[windowSize:2*windowSize])

Russ Cox . unresolved

Restore this code and delete the next line.
If that causes a significant slowdown, please file a cmd/compile issue with numbers and we will take care of it (by making copy compile to that).

Line 138, Patchset 22 (Latest): // Iterate over slices instead of arrays to avoid copying

Russ Cox . unresolved

I know it's not your code but please change this to:

// Note: range over &array to avoid copy (see go.dev/issue/18625).
... range &s.hashPrev ...
... range &s.hashHead ...

Line 153, Patchset 22 (Latest):func (d *compressor) writeBlock(tok *tokens, index int, eof bool) error {

Russ Cox . unresolved

Doc comment.

Line 167, Patchset 22 (Latest):// to determine if the block should be stored on no matches, or

Russ Cox . unresolved

"stored on no matches"?

Line 175, Patchset 22 (Latest): if int(tok.n) > len(window)-int(tok.n>>6) {

Russ Cox . unresolved

This does not match the comment. Is it supposed to be len(window)-len(window)>>6?

Line 246, Patchset 22 (Latest):// Try to find a match starting at index whose length is greater than prevSize.

Russ Cox . unresolved

Update doc comment. Currently has wrong format (should start with findMatch) and also refers to variables that don't exist (index, prevSize, chainCount).

Line 265, Patchset 22 (Latest): if d.chain < 100 {

Russ Cox . unresolved

This and the code below are almost exact copies. Given everything that's going on, it seems like it would not be slower to merge them into one loop that checks d.chain < 100 inside the n > length body to decide whether to apply the additional gain test.

Line 332, Patchset 22 (Latest):func (d *compressor) writeStoredBlock(buf []byte) error {

Russ Cox . unresolved

Doc comment.

Line 353, Patchset 22 (Latest):// bulkHash4 will compute hashes using the same

Russ Cox . unresolved

Doc comments should say what it does. Future tense ("will compute") is wrong. Please search for '//.*will' throughout, although I will try to flag them as well.

This specific comment should also explain more clearly what it does.

// bulkHash4 sets dst[i] = hash4(b[i:i+4]) for all i <= len(b)-4.

Line 369, Patchset 22 (Latest):func (d *compressor) initDeflate() {

Russ Cox . unresolved

// initDeflate initializes d for levels 7-9.

Line 384, Patchset 22 (Latest):// deflateLazy does encoding with lazy matching.

Russ Cox . unresolved

Explain a bit more what that means.

Line 408, Patchset 22 (Latest): for {

Russ Cox . unresolved

Please see if you can find a way to write this more clearly. The indentation is 12 levels deep at one point. A few lines of high-level comment about the strategy would help too.

Line 608, Patchset 22 (Latest):func (d *compressor) store() {

Russ Cox . unresolved

Doc comment.

Line 615, Patchset 22 (Latest):// fillWindow will fill the buffer with data for huffman-only compression.

Russ Cox . unresolved

// fillBlock appends b to d.window, returning the number of bytes copied.
// If n < len(b), the window filled.

Line 623, Patchset 22 (Latest):// storeHuff will compress and store the currently added data,

Russ Cox . unresolved

// storeHuff compresses and stores the current window
// (if it has filled or if we are in sync or flush).
// It uses the Huffman-only encoding.

Line 635, Patchset 22 (Latest):// storeFast will compress and store the currently added data,

Russ Cox . unresolved

// storeFast compresses and stores the current window
// (if it has filled or if we are in sync or flush).
// It uses the ... (fill in strategy)

Line 678, Patchset 22 (Latest):// write will add input byte to the stream.

Russ Cox . unresolved

// write adds b to the compressor.
// It can only return a short length if an error occurs.

Line 697, Patchset 22 (Latest):func (d *compressor) syncFlush() error {

Russ Cox . unresolved

Doc comment.

Line 698, Patchset 22 (Latest): d.sync = true

Russ Cox . unresolved

Why did this move up? Without looking at the diff I was going to suggest it belongs just before d.step(d), and now I see that's where it was before. If it is important to put here, then it deserves a clear comment why. (And also should be cleared on error?)

Line 712, Patchset 22 (Latest):func (d *compressor) init(w io.Writer, level int) (err error) {

Russ Cox . unresolved

Doc comment.

Line 740, Patchset 22 (Latest): d.step = (*compressor).deflateLazy

Russ Cox . unresolved

The other step functions have names beginning with store; should this be storeDeflate?

Line 748, Patchset 22 (Latest):func (d *compressor) reset(w io.Writer) {

Russ Cox . unresolved

Doc comment.

Line 755, Patchset 22 (Latest): d.windowEnd = 0

Russ Cox . unresolved

This can move up after d.err = nil and then deleted here, line 762, and line 773.

Line 761, Patchset 22 (Latest): // level was NoCompression or ConstantCompression.

Russ Cox . unresolved

ConstantCompression is mentioned here and in one test, but I don't see it defined anywhere.

Line 766, Patchset 22 (Latest): for i := range s.hashHead {

Russ Cox . unresolved

I was going to suggest using clear here, and I see that the old code did exactly that. If there is a performance problem with clear, please file a bug with a benchmark and we will fix it. We don't want hand optimizations that fight the language. We will make the compiler better as needed instead.

Line 785, Patchset 22 (Latest):func (d *compressor) close() error {

Russ Cox . unresolved

Doc comment.

Line 841, Patchset 22 (Latest): zw.dict = append(zw.dict, dict...) // duplicate dictionary for Reset method.

Russ Cox . unresolved

zw.dict = slices.Clone(dict) // save copy for Reset

Line 882, Patchset 22 (Latest): if len(w.dict) > 0 {

Russ Cox . unresolved

It seems like this entire function can be simplified to:

w.d.reset(dst)
w.d.fillWindow(w.dict)

File src/compress/flate/deflate_test.go

Line 125, Patchset 22 (Latest): t.Errorf("%d: Deflate(%d, %x) got \n%#v, want \n%#v", i, h.level, h.in, buf.Bytes(), h.out)

Russ Cox . unresolved

The old form with = is the correct idiom. Please restore it instead of using "got".

Line 369, Patchset 22 (Latest): os.WriteFile("testdata/fails/"+t.Name()+".got", out, os.ModePerm)

Russ Cox . unresolved

Remove (assuming this was for debugging) or put behind a -debug flag.
And then also print the names of the files that were written.

Line 432, Patchset 22 (Latest): // Remove returns that may be present on Windows

Russ Cox . unresolved

Delete. If there are \r on Windows then it means git has been configured incorrectly and has inserted them, in which case MANY things will break.
We expect that developers have configured git to write the files exactly as in the repo.

Line 477, Patchset 22 (Latest): text = "hello world Lorem ipsum dolor sit amet"

Russ Cox . unresolved

It seems like the text should contain _something_ that's not in the dictionary (the word "again" before)

Line 575, Patchset 22 (Latest): testResetOutput(t, fmt.Sprint("level-", i), func(w io.Writer) (*Writer, error) { return NewWriter(w, i) })

Russ Cox . unresolved

"dict=0/level=/

Line 579, Patchset 22 (Latest):

		testResetOutput(t, fmt.Sprint("dict-level-", i), func(w io.Writer) (*Writer, error) { return NewWriterDict(w, i, dict) })

Russ Cox . unresolved

"dict=1/level="

Line 606, Patchset 22 (Latest): t.Errorf("got %d, expected %d bytes", len(out2), len(out1))

Russ Cox . unresolved

Removing this return will make the out1[:len(out2)] below panic when len(out2) > len(out1). Please restore or else fix the comparison code below.

Line 658, Patchset 22 (Latest): t.Skip()

Russ Cox . unresolved

break seems better here. Skip is misleading. Some tests ran. Also the old comparison was clearer. If you write i >= 6 then that means the first 6 run. (If you write i > 5 that also means the first 6 run, but that's confusing.)

Line 813, Patchset 22 (Latest): previous, current []byte

Russ Cox . unresolved

Please put each field on its own line to avoid the weird spacing.

File src/compress/flate/deflatefast.go

Line 10, Patchset 22 (Latest):

Russ Cox . unresolved

A comment here giving an overview of what is going on would be helpful.
Pointers to docs about algorithms, strategy for levels, and os on.

Line 11, Patchset 22 (Latest):type fastEnc interface {

Russ Cox . unresolved

Doc comment.

Line 16, Patchset 22 (Latest):func newFastEnc(level int) fastEnc {

Russ Cox . unresolved

Doc comment.

Line 36, Patchset 22 (Latest): tableBits = 15 // Bits used in the table

Russ Cox . unresolved

This is a very indented comment. In a situation like this it is best to insert some blank lines and possibly switch to comments above each value.

Line 55, Patchset 22 (Latest):type tableEntry struct {

Russ Cox . unresolved

Doc comment

Line 60, Patchset 22 (Latest):// and the previous byte block for level 2.

Russ Cox . unresolved

Only level 2? Not 2 and up?

Line 67, Patchset 22 (Latest):func (e *fastGen) addBlock(src []byte) int32 {

Russ Cox . unresolved

Doc comment

Line 78, Patchset 22 (Latest): // copy(e.hist[0:maxMatchOffset], e.hist[offset:])

Russ Cox . unresolved

Use copy; file bug if performance needs fixing.
Note that you may need to write e.host[offset:offset+maxMatchOffset].

Line 89, Patchset 22 (Latest):type tableEntryPrev struct {

Russ Cox . unresolved

Doc comment. Also should it be next to tableEntry?

Line 94, Patchset 22 (Latest):// hashLen returns a hash of the lowest mls bytes of with length output bits.

Russ Cox . unresolved

There are some grammar errors here, and the names are a bit confusing.

```
// hashLen returns a hash of the first n bytes of u, using b output bits.
// It expects 3 <= n <= 8; other values are treated as n == 4.
// The bit length b must be <= 32.
func hashLen(u uint64, b, n int) uint32 { ... }
```

Line 115, Patchset 22 (Latest):// matchLenLimited will return the match length between offsets and t in src.

Russ Cox . unresolved

s/will return/returns/
Also I think it means "offsets s and t".

Line 117, Patchset 22 (Latest):// It is assumed that s > t, that t >=0 and s < len(src).

Russ Cox . unresolved

space after >=

Line 124, Patchset 22 (Latest):// matchlenLong will return the match length between offsets and t in src.

Russ Cox . unresolved

Same comments.

Line 126, Patchset 22 (Latest):func (e *fastGen) matchlenLong(s, t int, src []byte) int32 {

Russ Cox . unresolved

matchLenLong (capital L to match the others)

Line 130, Patchset 22 (Latest):// Reset the encoding table.

Russ Cox . unresolved

// reset resets the encoding table to prepare for a new compression stream.

Line 131, Patchset 22 (Latest):func (e *fastGen) reset() {

Russ Cox . unresolved

Move this function elsewhere, so that matchLenLimited, matchLenLong, and matchLen are all next to each other. As written right now, reset interrupts the flow. Perhaps reset should be up near the definition of fastGen.

Line 157, Patchset 22 (Latest): b = b[n:]

Russ Cox . unresolved

I think if you add

b = b[:len(a)]

on the next line, that should get rid of the bounds checks on b in the loop.

Line 167, Patchset 22 (Latest):// Used to get the embedded fastGen from each level struct.

Russ Cox . unresolved

You can define just

func (f *fastGen) getFastGen() *fastGen { return f }

and then it will be inherited from the embedding. No need to say it 6 times.
Also move it up near the definition of fastGen.

File src/compress/flate/dict_decoder.go

Line 166, Patchset 22 (Parent):

Russ Cox . unresolved

Restore this blank line to avoid a spurious diff.

File src/compress/flate/example_test.go

Line 96, Patchset 22 (Latest): zw, err := flate.NewWriterDict(&b, flate.BestCompression, []byte(dict))

Russ Cox . unresolved

change back to DefaultCompression (no need to change the example, nor to suggest that people use the expensive one when copying this example)

Line 171, Patchset 22 (Latest): defer wp.Close()

Russ Cox . unresolved

Remove. This should not be necessary. If it is necessary, then we have a problem, since it means that flate is reading from the pipe beyond the end of the compressed stream.

File src/compress/flate/huffman_bit_writer.go

Line 37, Patchset 22 (Latest):// Minimum length code that emits bits.

Russ Cox . unresolved

// lengthExtraBitsMinCode is ...

Line 40, Patchset 22 (Latest):// The number of extra bits needed by length code X - LENGTH_CODES_START.

Russ Cox . unresolved

// lengthExtraBits[i] is the number of extra bits needed by
// length code i - lengthCodesStart.

(I know you didn't write this but let's clean it up.)

Line 48, Patchset 22 (Latest):// The length indicated by length code X - LENGTH_CODES_START.

Russ Cox . unresolved

// lengthBase[i] is the length indicated by length code i - lengthCodesStart.

Line 55, Patchset 22 (Latest):// Minimum offset code that emits bits.

Russ Cox . unresolved

// offsetExtraBitsMinCode ...

Line 58, Patchset 22 (Latest):// offset code word extra bits.

Russ Cox . unresolved

Fix doc comment

Line 69, Patchset 22 (Latest):func init() {

Russ Cox . unresolved

Avoid init-time work. Define offsetCombined as a static array.
Fine to add a test that has this computation in it to check that it's correct,
if you're worried about that.

Line 92, Patchset 22 (Latest):// The odd order in which the codegen code sizes are written.

Russ Cox . unresolved

// codgenOrder is the order in which codegen code sizes are written.

Line 95, Patchset 22 (Latest):type huffmanBitWriter struct {

Russ Cox . unresolved

Doc comment

Line 106, Patchset 22 (Latest): lastHuffMan bool

Russ Cox . unresolved

A better name would help and certainly a lowercase m.
Maybe 'wroteHuffman'?

Line 112, Patchset 22 (Latest): lastHeader int

Russ Cox . unresolved

I suggest making this 'prevHeader', since you are using last to mean previous and not last to mean final.

Line 126, Patchset 22 (Latest):// This is controlled by several variables:

Russ Cox . unresolved

Move this discussion into the struct, to document the variables.
Line comments above variable names are fine.

Line 140, Patchset 22 (Latest):func newHuffmanBitWriter(w io.Writer) *huffmanBitWriter {

Russ Cox . unresolved

Doc comment

Line 150, Patchset 22 (Latest):func (w *huffmanBitWriter) reset(writer io.Writer) {

Russ Cox . unresolved

Doc comment

Line 157, Patchset 22 (Latest):func (w *huffmanBitWriter) canReuse(t *tokens) (ok bool) {

Russ Cox . unresolved

Doc comment

Line 186, Patchset 22 (Latest):func (w *huffmanBitWriter) flush() {

Russ Cox . unresolved

Doc comment

Line 214, Patchset 22 (Latest):func (w *huffmanBitWriter) write(b []byte) {

Russ Cox . unresolved

Doc comment

Line 221, Patchset 22 (Latest):func (w *huffmanBitWriter) writeBits(b int32, nb uint8) {

Russ Cox . unresolved

Doc comment

Line 222, Patchset 22 (Latest): w.bits |= uint64(b) << (w.nbits & 63)

Russ Cox . unresolved

shiftMask?

Line 229, Patchset 22 (Latest):func (w *huffmanBitWriter) writeBytes(bytes []byte) {

Russ Cox . unresolved

Doc comment

Line 264, Patchset 22 (Latest): for i := range w.codegenFreq {

Russ Cox . unresolved

Use clear; file a bug if performance isn't good enough.

Line 344, Patchset 22 (Latest):func (w *huffmanBitWriter) codegens() int {

Russ Cox . unresolved

Doc comment.

Line 352, Patchset 22 (Latest):func (w *huffmanBitWriter) headerSize() (size, numCodegens int) {

Russ Cox . unresolved

Doc comment.

Line 381, Patchset 22 (Latest):// extraBitSize will return the number of bits that will be written

Russ Cox . unresolved

s/will return/returns/

Line 416, Patchset 22 (Latest):// Inline manually when performance is critical.

Russ Cox . unresolved

Not wild about this comment. Let's figure out how to avoid this. Perhaps something like

```
type bits struct {
    b uint64
    nb uint
}

func (b bits) writeCode(c hcode, w *huffmanBitWriter) bits {
    b.b |= c.code64() << (b.nb & shiftMask)
    if b.nb += c.len(); b.nb >= 48 {
        w.flushBits(b.b)
        b.b >>= 48
        b.nb -= 48
    }
    return b
}
```

That should inline just fine. It might even work to use a pointer method and drop the return, but I'm not sure.

Then the callers who want to move the bit writing state to the stack can do

```
bits := w.bits
... bits = bits.writeCode(c, w) ...
w.bits = bits
```

Line 425, Patchset 22 (Latest):// writeOutBits will write bits to the buffer.

Russ Cox . unresolved

"write out" is vague. flushBits?

Line 494, Patchset 22 (Latest):// writeStoredHeader will write a stored header.

Russ Cox . unresolved

s/will write/writes/

Line 544, Patchset 22 (Latest):// writeBlock will write a block of tokens with the smallest encoding.

Russ Cox . unresolved

// writeBlock writes a block of tokens using the smallest encoding.

Line 545, Patchset 22 (Latest):// The original input can be supplied, and if the huffman encoded data

Russ Cox . unresolved

s/huffman encoded/Huffman-encoded/

Line 614, Patchset 22 (Latest):// input size the block is stored.

Russ Cox . unresolved

s/size/size,/

Line 628, Patchset 22 (Latest): // We cannot reuse pure huffman table, and must mark as EOF.

Russ Cox . unresolved

s/huffman/Huffman/

Line 651, Patchset 22 (Latest): // Check if we should reuse.

Russ Cox . unresolved

// Check whether we should reuse the previous Huffman table.

Line 753, Patchset 22 (Latest):// indexTokens indexes a slice of tokens, and updates

Russ Cox . unresolved

// indexTokens indexes a slice of tokens, updates literalFreq and offsetFreq,
// and generates literalEncoding and offsetEncoding.
// It returns the number of literal and offset tokens.

Line 784, Patchset 22 (Latest):func (w *huffmanBitWriter) generate() {

Russ Cox . unresolved

Doc comment

Line 790, Patchset 22 (Latest):// codes for literal and offset encoding must be supplied.

Russ Cox . unresolved

s/codes/Codes/

Line 791, Patchset 22 (Latest):func (w *huffmanBitWriter) writeTokens(tokens []token, leCodes, oeCodes []hcode) {

Russ Cox . unresolved

s/leCodes, oeCodes/lenCodes, offCodes/

Line 812, Patchset 22 (Latest): // Go 1.16 LOVES having these on stack.

Russ Cox . unresolved

As noted above, let's find a way to avoid this duplication. We should be able to get this down to

```
bits := w.bits
for _, t := range tokens {
    if t < 256 {
        bits = bits.writeCode(lits[t], w)
        continue
    }
    ...
}
w.bits = bits
```

Line 930, Patchset 22 (Latest):// huffOffset is a static offset encoder used for huffman only encoding.

Russ Cox . unresolved

s/huffman only/Huffman-only/

Line 932, Patchset 22 (Latest):var huffOffset *huffmanEncoder

Russ Cox . unresolved

```
var huffOffset = sync.OnceValue(func() *huffmanEncoder { 
    w := newHuffmanBitWriter(nil)
    w.offsetFreq[0] = 1
    h := newHuffmanEncoder(offsetCodeCount)
    h.generate(w.offsetFreq[:offsetCodeCount], 15)
    return h
})
```

That will avoid init-time work.

Line 942, Patchset 22 (Latest):// Huffman encoded literals or uncompressed bytes if the

Russ Cox . unresolved

Huffman-encoded

Line 943, Patchset 22 (Latest):// results only gains very little from compression.

Russ Cox . unresolved

s/only gains/gain/

Line 958, Patchset 22 (Latest): // Add everything as literals

Russ Cox . unresolved

// Estimate size of literal encoding.

(delete rest of comment)

Line 962, Patchset 22 (Latest): const guessHeaderSizeBits = 70 * 8

Russ Cox . unresolved

const guessHeaderSizeBits = 70 * 8 // 70 bytes; see https://stackoverflow.com/a/25454430

Line 966, Patchset 22 (Latest): // Quick check for incompressible content.

Russ Cox . unresolved

I am confused about what this is doing. It is checking whether

sum_i |f_i - n/256]^2 < 2n

where i ranges over 0..255 and f_i is the frequency of byte i.
But I don't really understand why it makes sense to compare the sum of the squares of the deviations from perfect uniformity with 2n.

Is there some reference to this algorithm that can be added as a comment?

Line 973, Patchset 22 (Latest): if abs > max {

Russ Cox . unresolved

Since the test below is abs < max, this should be abs >= max.

Line 1004, Patchset 22 (Latest):

Russ Cox . unresolved

Delete odd blank line.

Line 1014, Patchset 22 (Latest): w.literalEncoding, w.tmpLitEncoding = w.tmpLitEncoding, w.literalEncoding

Russ Cox . unresolved

When do these get un-swapped?

Line 1086, Patchset 22 (Latest): if w.nbits >= 48 {

Russ Cox . unresolved

How can this be? Doesn't everything that adds to bits check and flush?

File src/compress/flate/huffman_code.go

Line 22, Patchset 22 (Latest):func (h hcode) len() uint8 {

Russ Cox . unresolved

Doc comment

Line 26, Patchset 22 (Latest):func (h hcode) code64() uint64 {

Russ Cox . unresolved

Doc comment

Line 30, Patchset 22 (Latest):func (h hcode) zero() bool {

Russ Cox . unresolved

Doc comment

Line 36, Patchset 22 (Latest): *h = hcode(length) | (hcode(code) << 8)

Russ Cox . unresolved

*h = newhcode(code, length)

or just delete this method entirely and write that where it gets used

Line 43, Patchset 22 (Latest):type huffmanEncoder struct {

Russ Cox . unresolved

Doc comment

Line 47, Patchset 22 (Latest): // Allocate a reusable buffer with the longest possible frequency table.

Russ Cox . unresolved

// freqcache is a reusable ...

Line 53, Patchset 22 (Latest):func newHuffmanEncoder(size int) *huffmanEncoder {

Russ Cox . unresolved

Doc comment

Line 59, Patchset 22 (Latest):type literalNode struct {

Russ Cox . unresolved

Doc comment

Line 86, Patchset 22 (Latest):func reverseBits(number uint16, bitLength byte) uint16 {

Russ Cox . unresolved

Doc comment. Also rename the arguments to x and b, which will make the comment and code clearer.

Line 91, Patchset 22 (Latest):func generateFixedLiteralEncoding() *huffmanEncoder {

Russ Cox . unresolved

// generateFixedLiteralEncoding returns the encoder for the fixed literal table.

Line 130, Patchset 22 (Latest):var fixedLiteralEncoding = generateFixedLiteralEncoding()

Russ Cox . unresolved

```
var (
    fixedLiteralEncoding = sync.OnceValue(generateFixedLiteralEncoding)
    fixedOffsetEncoding = sync.OnceValue(generateFixedOffsetEncoding)
)
```

to avoid init time work

Line 133, Patchset 22 (Latest):func (h *huffmanEncoder) bitLength(freq []uint16) int {

Russ Cox . unresolved

Doc comment

Line 143, Patchset 22 (Latest):func (h *huffmanEncoder) bitLengthRaw(b []byte) int {

Russ Cox . unresolved

Doc comment

Line 151, Patchset 22 (Latest):// canReuseBits returns the number of bits or math.MaxInt32 if the encoder cannot be reused.

Russ Cox . unresolved

// canReuseBits returns the number of bits to encode freq.
// It returns math.MaxInt32 if freq cannot be encoded.

At that point I think it should be something like 'canEncodeLen'?

Line 166, Patchset 22 (Latest):// Return the number of literals assigned to each bit size in the Huffman encoding

Russ Cox . unresolved

Not sure how it got this way but the actual doc comment starts on line 179 and should be brought up.

// bitCounts returns an integer slice in which slice[i] is the number
// of literals that should be encoded using i bits.
//
// This method is only called when ...

Line 206, Patchset 22 (Latest): // Descending to only have 1 bounds check.

Russ Cox . unresolved

_ = list[2] // check bounds here instead of in loop

(and then put the old code back)

Line 305, Patchset 22 (Latest):// Look at the leaves and assign them a bit count and an encoding as specified

Russ Cox . unresolved

// assignEncodingAndSize assigns bit counts and encodings to the leaves
// as specified in RFC 1951 3.2.2.

Line 331, Patchset 22 (Latest):// Update this Huffman Code object to be the minimum code for the specified frequency count.

Russ Cox . unresolved

// generate rewrites h to be the Huffman code for the given frequency count.
// freq[i] is the frequency of literal i, and maxBits is the maximum number
// of bits to use for any literal.

Line 372, Patchset 22 (Latest):// atLeastOne clamps the result between 1 and 15.

Russ Cox . unresolved

Rename to 'clamp1to15'.
atLeastOne should be max(v, 1).
Or maybe


```
func clamp(lo, x, hi float32) float32 {
    return min(max(lo, x), hi)
}
```

and then use clamp(1, x, 15) at the call sites.

Line 380, Patchset 22 (Latest): histogramSplit(b, h)

Russ Cox . unresolved

return after this and drop else

Line 390, Patchset 22 (Latest): // Tested, and slightly faster than 2-way.

Russ Cox . unresolved

// Walk four quarters in parallel.
// Tested to be faster than walking halves.

Line 399, Patchset 22 (Latest): x, y, z, w := b[:n], b[n:], b[n+n:], b[n+n+n:]

Russ Cox . unresolved

The compiler is good enough that this should be fine:

```
b0, b1, b2, b3 := b[0*n:][:n], b[1*n:][:n], b[2*n:][:n], b[3*n:][:n]
for i, t := range b0 {
    h0 := &h[t]
    h1 := &h[b1[i]]
    h2 := &h[b2[i]]
    h3 := &h[b3[i]]
    *h0++
    *h1++
    *h2++
    *h3++
}
```

I noticed that you initialized v0,v1,v3,v2 (not v2,v3). If that's important it needs a detailed comment and a benchmark. :-)

File src/compress/flate/level1.go

Line 6, Patchset 22 (Latest):

Russ Cox . unresolved

It would help to have some guide here, or at the bottom of deflatefast.go, explaining the algorithm used for each level and why each level is different.
I tried

diff level1.go level2.go
diff level2.go level3.go
diff level3.go level4.go
diff level4.go level5.go
diff level5.go level6.go

and it almost looks like every level is bespoke, without an overarching structure and organization. Of course I am sure that's not the case. It would help a lot to explain that in a comment, and also unnecessary differences in those diffs should be minimized, if you see any when you run those commands.

File src/compress/flate/regmask_amd64.go

Line 7, Patchset 22 (Latest):const (

Russ Cox . unresolved

// shiftMask is a no-op shift mask for x86-64.
// Using it lets the compiler omit the check for shift size >= 64.
const shiftMask = 63

File src/compress/flate/regmask_other.go

Line 9, Patchset 22 (Latest):const (

Russ Cox . unresolved

// shiftMask is a no-op shift mask for non-x86-64.
// The compiler will optimize it away.
const shiftMask = 0xFF

File src/compress/flate/token.go

Line 124, Patchset 22 (Latest):type token uint32

Russ Cox . unresolved

Doc comment.

Line 138, Patchset 22 (Latest):func (t *tokens) Reset() {

Russ Cox . unresolved

Doc comment.

Line 149, Patchset 22 (Latest):func indexTokens(in []token) tokens {

Russ Cox . unresolved

Doc comment.

Line 155, Patchset 22 (Latest):func (t *tokens) indexTokens(in []token) {

Russ Cox . unresolved

Doc comment.

Line 175, Patchset 22 (Latest):func (t *tokens) AddLiteral(lit byte) {

Russ Cox . unresolved

Doc comment.

Line 181, Patchset 22 (Latest):// from https://stackoverflow.com/a/28730362

Russ Cox . unresolved

Doc comment.

Line 192, Patchset 22 (Latest):// EstimatedBits will return a minimum size estimated by an *optimal*

Russ Cox . unresolved

// EstimatedBits returns an estimated minimum size for the
// optimal compression of t.

Line 267, Patchset 22 (Latest):func (t *tokens) AddEOB() {

Russ Cox . unresolved

Doc comment.

Line 278, Patchset 22 (Latest):// Returns the type of a token

Russ Cox . unresolved

// typ returns ...

Line 281, Patchset 22 (Latest):// Returns the literal of a literal token

Russ Cox . unresolved

// literal returns the literal value of t.

Line 284, Patchset 22 (Latest):// Returns the extra offset of a match token

Russ Cox . unresolved

// offset returns....

Line 289, Patchset 22 (Latest):// Convert length to code.

Russ Cox . unresolved

// lengthCode ...

Line 292, Patchset 22 (Latest):// Returns the offset code corresponding to a specific offset

Russ Cox . unresolved

// offsetCode ...

File src/compress/flate/writer_test.go

Line 48, Patchset 20: b.Run(fmt.Sprint("level-", level), func(b *testing.B) {

Russ Cox . unresolved

Please use level=, which will work better with benchstat.

Line 150, Patchset 20: t.Run(fmt.Sprintf("level-%d", l), func(t *testing.T) {

Russ Cox . unresolved

level=

File src/compress/zlib/example_test.go

Line 26, Patchset 20: buff := []byte{120, 156, 202, 72, 205, 201, 201, 215, 81, 40, 207,

Russ Cox . unresolved

Please update to match ExampleNewWriter.

File src/debug/elf/file_test.go

Line 1563, Patchset 19: // Insert zlib compressed sec.Data() block with `[]byte{1, 0, 0, 0}` as the first 4 bytes

Cherry Mui . resolved

Is this change necessary? Is the old code stop working?

Klaus Post

The previous test relied on changing specific bytes in the *compressed* data, so it relied on a specific encoding to reproduce.
The only way I could see to make this encoding-agnostic was to insert the data with the error the bug tries to reproduce.

Russ Cox

Acknowledged

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Feb 27, 2026, 8:07:16 AMFeb 27

to Gerrit Bot, goph...@pubsubhelper.golang.org, Russ Cox, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Sean Liao and t hepudds

Klaus Post added 1 comment

Patchset-level comments

File-level comment, Patchset 22 (Latest):

Klaus Post . resolved

Thanks for the review! Skimming through it, it all seems quite reasonable. I will probably take a few days to go through them, so expect an update some time next week.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Sean Liao
t hepudds

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Mar 5, 2026, 7:02:00 AM (14 days ago) Mar 5

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #23 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Mar 5, 2026, 7:46:41 AM (14 days ago) Mar 5

to Gerrit Bot, goph...@pubsubhelper.golang.org, Russ Cox, Go LUCI, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Russ Cox, Sean Liao and t hepudds

Klaus Post voted and added 157 comments

Votes added by Klaus Post

Commit-Queue

+1

157 comments

Patchset-level comments

File-level comment, Patchset 23 (Latest):

Klaus Post . resolved

Updated the majority, but leaving some of the riskier changes for separate evaluation. Ran fuzz test for 16h.

Thanks for taking the time for this. It is great to get eyes on things I have been working solo on for years.

File src/compress/flate/deflate.go

Line 30, Patchset 22:

Russ Cox . resolved

Please restore the closing ) here and opening const ( for the next block.
The purpose of those is to keep exported and unexported constants separate,
both for readability in source code and to keep godoc from saying something
about unexported things when you run, say, 'go doc flate.HuffmanOnly'.

Klaus Post

Acknowledged

Line 52, Patchset 22:type compressionLevel struct {

Russ Cox . resolved

Even unexported symbols - types, variables, methods, everything - need doc comments explaining what they are. Please add throughout. I realize that much of this code predates you, but you understand it best right now and are in the best position to write a sentence or two about each.

Klaus Post

Acknowledged

Line 53, Patchset 22: good, lazy, nice, chain, level int

Russ Cox . resolved

In general please put struct fields on their own lines.
And then comment what each one means.

Klaus Post

Acknowledged

Line 72, Patchset 22:// advancedState contains state for the advanced levels, with bigger hash tables, etc.

Russ Cox . resolved

s/the advanced levels/levels 7-9/ ?

Klaus Post

Acknowledged

Line 75, Patchset 22: length int

Russ Cox . unresolved

It looks like a bunch of conversions would disappear if these were maintained as int32, and similarly for the hash arrays below.

Klaus Post

Yes, should be pretty safe, since it will be the same on 32 bits. I will do it separately to evaluate if we get more/less conversions - since slice length, etc would need converting.

Line 81, Patchset 22: ii uint16 // position of last match, intended to overflow to reset.

Russ Cox . resolved

There must be a better name than 'ii'?

Klaus Post

Acknowledged. Renamed ii to literalCounter

Line 85, Patchset 22: hashMatch [maxMatchLength + minMatchLength]uint32

Russ Cox . unresolved

Unless the window can be > 2GB, use int32 to avoid weird math around 0.

Klaus Post

Will attempt as separate commit.

Line 96, Patchset 22:type compressor struct {

Russ Cox . resolved

Doc comment. Also add a brief comment to each struct field.

Klaus Post

Acknowledged

Line 106, Patchset 22: window []byte

Russ Cox . resolved

// sliding window, of length 2*windowSize OR 32kB OR maxStoreBlockSize, depending on level
(This seems error-prone but is at least true.)

Klaus Post

Acknowledged. Went for a simpler: `// current window - size depends on encoder level`

Line 120, Patchset 22:func (d *compressor) fillDeflate(b []byte) int {

Russ Cox . resolved

Doc comment. Mention this is only for levels 7-9.

Klaus Post

Acknowledged

Line 124, Patchset 22: //copy(d.window[:], d.window[windowSize:2*windowSize])

Russ Cox . resolved

Restore this code and delete the next line.
If that causes a significant slowdown, please file a cmd/compile issue with numbers and we will take care of it (by making copy compile to that).

Klaus Post

Acknowledged

Line 138, Patchset 22: // Iterate over slices instead of arrays to avoid copying

Russ Cox . resolved

I know it's not your code but please change this to:
// Note: range over &array to avoid copy (see go.dev/issue/18625).
... range &s.hashPrev ...
... range &s.hashHead ...

Klaus Post

Acknowledged

Line 153, Patchset 22:func (d *compressor) writeBlock(tok *tokens, index int, eof bool) error {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 167, Patchset 22:// to determine if the block should be stored on no matches, or

Russ Cox . resolved

"stored on no matches"?

Klaus Post

Acknowledged

Line 175, Patchset 22: if int(tok.n) > len(window)-int(tok.n>>6) {

Russ Cox . resolved

This does not match the comment. Is it supposed to be len(window)-len(window)>>6?

Klaus Post

Acknowledged. It should be functionally the same (within a few bytes), but you are right it is clearer.

Line 246, Patchset 22:// Try to find a match starting at index whose length is greater than prevSize.

Russ Cox . resolved

Update doc comment. Currently has wrong format (should start with findMatch) and also refers to variables that don't exist (index, prevSize, chainCount).

Klaus Post

Acknowledged

Line 265, Patchset 22: if d.chain < 100 {

Russ Cox . resolved

This and the code below are almost exact copies. Given everything that's going on, it seems like it would not be slower to merge them into one loop that checks d.chain < 100 inside the n > length body to decide whether to apply the additional gain test.

Klaus Post

It is an inner loop branch, but it would be predictable on most current CPUs. Merged the loops.

Line 332, Patchset 22:func (d *compressor) writeStoredBlock(buf []byte) error {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 353, Patchset 22:// bulkHash4 will compute hashes using the same

Russ Cox . resolved

Doc comments should say what it does. Future tense ("will compute") is wrong. Please search for '//.*will' throughout, although I will try to flag them as well.
This specific comment should also explain more clearly what it does.
// bulkHash4 sets dst[i] = hash4(b[i:i+4]) for all i <= len(b)-4.

Klaus Post

Acknowledged

Line 369, Patchset 22:func (d *compressor) initDeflate() {

Russ Cox . resolved

// initDeflate initializes d for levels 7-9.

Klaus Post

Acknowledged

Line 384, Patchset 22:// deflateLazy does encoding with lazy matching.

Russ Cox . resolved

Explain a bit more what that means.

Klaus Post

Acknowledged

Line 408, Patchset 22: for {

Russ Cox . unresolved

Please see if you can find a way to write this more clearly. The indentation is 12 levels deep at one point. A few lines of high-level comment about the strategy would help too.

Klaus Post

Will attempt as separate commit.

Line 608, Patchset 22:func (d *compressor) store() {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 615, Patchset 22:// fillWindow will fill the buffer with data for huffman-only compression.

Russ Cox . resolved

// fillBlock appends b to d.window, returning the number of bytes copied.
// If n < len(b), the window filled.

Klaus Post

Acknowledged

Line 623, Patchset 22:// storeHuff will compress and store the currently added data,

Russ Cox . resolved

// storeHuff compresses and stores the current window
// (if it has filled or if we are in sync or flush).
// It uses the Huffman-only encoding.

Klaus Post

Acknowledged

Line 635, Patchset 22:// storeFast will compress and store the currently added data,

Russ Cox . resolved

// storeFast compresses and stores the current window
// (if it has filled or if we are in sync or flush).
// It uses the ... (fill in strategy)

Klaus Post

Acknowledged. Rephrased a bit.

Line 678, Patchset 22:// write will add input byte to the stream.

Russ Cox . resolved

// write adds b to the compressor.
// It can only return a short length if an error occurs.

Klaus Post

Acknowledged

Line 697, Patchset 22:func (d *compressor) syncFlush() error {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 698, Patchset 22: d.sync = true

Russ Cox . resolved

Why did this move up? Without looking at the diff I was going to suggest it belongs just before d.step(d), and now I see that's where it was before. If it is important to put here, then it deserves a clear comment why. (And also should be cleared on error?)

Klaus Post

Acknowledged. Seems to be a fix that was never backported.

Line 712, Patchset 22:func (d *compressor) init(w io.Writer, level int) (err error) {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 740, Patchset 22: d.step = (*compressor).deflateLazy

Russ Cox . resolved

The other step functions have names beginning with store; should this be storeDeflate?

Klaus Post

I think I'd rather do 'deflateFast', and 'deflateHuff'. That is a better representation of what is going on.

Line 748, Patchset 22:func (d *compressor) reset(w io.Writer) {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 755, Patchset 22: d.windowEnd = 0

Russ Cox . resolved

This can move up after d.err = nil and then deleted here, line 762, and line 773.

Klaus Post

Acknowledged

Line 761, Patchset 22: // level was NoCompression or ConstantCompression.

Russ Cox . resolved

ConstantCompression is mentioned here and in one test, but I don't see it defined anywhere.

Klaus Post

Acknowledged

Line 766, Patchset 22: for i := range s.hashHead {

Russ Cox . resolved

I was going to suggest using clear here, and I see that the old code did exactly that. If there is a performance problem with clear, please file a bug with a benchmark and we will fix it. We don't want hand optimizations that fight the language. We will make the compiler better as needed instead.

Klaus Post

Acknowledged. Will do an overall benchmark to check for regressions.

Line 785, Patchset 22:func (d *compressor) close() error {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 841, Patchset 22: zw.dict = append(zw.dict, dict...) // duplicate dictionary for Reset method.

Russ Cox . resolved

zw.dict = slices.Clone(dict) // save copy for Reset

Klaus Post

Acknowledged (added another comment)

Line 882, Patchset 22: if len(w.dict) > 0 {

Russ Cox . resolved

It seems like this entire function can be simplified to:
w.d.reset(dst)
w.d.fillWindow(w.dict)

Klaus Post

Acknowledged

File src/compress/flate/deflate_test.go

Line 125, Patchset 22: t.Errorf("%d: Deflate(%d, %x) got \n%#v, want \n%#v", i, h.level, h.in, buf.Bytes(), h.out)

Russ Cox . resolved

The old form with = is the correct idiom. Please restore it instead of using "got".

Klaus Post

Acknowledged

Line 369, Patchset 22: os.WriteFile("testdata/fails/"+t.Name()+".got", out, os.ModePerm)

Russ Cox . resolved

Remove (assuming this was for debugging) or put behind a -debug flag.
And then also print the names of the files that were written.

Klaus Post

Acknowledged

Line 432, Patchset 22: // Remove returns that may be present on Windows

Russ Cox . resolved

Delete. If there are \r on Windows then it means git has been configured incorrectly and has inserted them, in which case MANY things will break.
We expect that developers have configured git to write the files exactly as in the repo.

Klaus Post

Acknowledged. Yeah, seem to remember this was a problem on a CI I used a while back.

Line 477, Patchset 22: text = "hello world Lorem ipsum dolor sit amet"

Russ Cox . resolved

It seems like the text should contain _something_ that's not in the dictionary (the word "again" before)

Klaus Post

Acknowledged

Line 575, Patchset 22: testResetOutput(t, fmt.Sprint("level-", i), func(w io.Writer) (*Writer, error) { return NewWriter(w, i) })

Russ Cox . resolved

"dict=0/level=/

Klaus Post

Acknowledged

Line 579, Patchset 22:

		testResetOutput(t, fmt.Sprint("dict-level-", i), func(w io.Writer) (*Writer, error) { return NewWriterDict(w, i, dict) })

Russ Cox . resolved

"dict=1/level="

Klaus Post

Acknowledged

Line 606, Patchset 22: t.Errorf("got %d, expected %d bytes", len(out2), len(out1))

Russ Cox . resolved

Removing this return will make the out1[:len(out2)] below panic when len(out2) > len(out1). Please restore or else fix the comparison code below.

Klaus Post

Acknowledged

Line 658, Patchset 22: t.Skip()

Russ Cox . resolved

break seems better here. Skip is misleading. Some tests ran. Also the old comparison was clearer. If you write i >= 6 then that means the first 6 run. (If you write i > 5 that also means the first 6 run, but that's confusing.)

Klaus Post

Acknowledged

Line 813, Patchset 22: previous, current []byte

Russ Cox . resolved

Please put each field on its own line to avoid the weird spacing.

Klaus Post

Acknowledged

File src/compress/flate/deflatefast.go

Line 10, Patchset 22:

Russ Cox . resolved

A comment here giving an overview of what is going on would be helpful.
Pointers to docs about algorithms, strategy for levels, and os on.

Klaus Post

Acknowledged

Line 11, Patchset 22:type fastEnc interface {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 16, Patchset 22:func newFastEnc(level int) fastEnc {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 36, Patchset 22: tableBits = 15 // Bits used in the table

Russ Cox . resolved

This is a very indented comment. In a situation like this it is best to insert some blank lines and possibly switch to comments above each value.

Klaus Post

Acknowledged

Line 55, Patchset 22:type tableEntry struct {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 60, Patchset 22:// and the previous byte block for level 2.

Russ Cox . resolved

Only level 2? Not 2 and up?

Klaus Post

Acknowledged. Outdated docs.

Line 67, Patchset 22:func (e *fastGen) addBlock(src []byte) int32 {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 78, Patchset 22: // copy(e.hist[0:maxMatchOffset], e.hist[offset:])

Russ Cox . resolved

Use copy; file bug if performance needs fixing.
Note that you may need to write e.host[offset:offset+maxMatchOffset].

Klaus Post

Acknowledged

Line 89, Patchset 22:type tableEntryPrev struct {

Russ Cox . resolved

Doc comment. Also should it be next to tableEntry?

Klaus Post

Acknowledged

Line 94, Patchset 22:// hashLen returns a hash of the lowest mls bytes of with length output bits.

Russ Cox . resolved

There are some grammar errors here, and the names are a bit confusing.
```
// hashLen returns a hash of the first n bytes of u, using b output bits.
// It expects 3 <= n <= 8; other values are treated as n == 4.
// The bit length b must be <= 32.
func hashLen(u uint64, b, n int) uint32 { ... }
```

Klaus Post

Acknowledged

Line 115, Patchset 22:// matchLenLimited will return the match length between offsets and t in src.

Russ Cox . resolved

s/will return/returns/
Also I think it means "offsets s and t".

Klaus Post

Acknowledged

Line 117, Patchset 22:// It is assumed that s > t, that t >=0 and s < len(src).

Russ Cox . resolved

space after >=

Klaus Post

Acknowledged

Line 124, Patchset 22:// matchlenLong will return the match length between offsets and t in src.

Russ Cox . resolved

Same comments.

Klaus Post

Acknowledged

Line 126, Patchset 22:func (e *fastGen) matchlenLong(s, t int, src []byte) int32 {

Russ Cox . resolved

matchLenLong (capital L to match the others)

Klaus Post

Acknowledged

Line 130, Patchset 22:// Reset the encoding table.

Russ Cox . resolved

// reset resets the encoding table to prepare for a new compression stream.

Klaus Post

Acknowledged

Line 131, Patchset 22:func (e *fastGen) reset() {

Russ Cox . resolved

Move this function elsewhere, so that matchLenLimited, matchLenLong, and matchLen are all next to each other. As written right now, reset interrupts the flow. Perhaps reset should be up near the definition of fastGen.

Klaus Post

ACK. Grouped all 'fastGen' and moved types/functions used for encoders to end.

Line 157, Patchset 22: b = b[n:]

Russ Cox . resolved

I think if you add
b = b[:len(a)]
on the next line, that should get rid of the bounds checks on b in the loop.

Klaus Post

Acknowledged. This is only hit at the very end of a window, but you are absolutely right!

Line 167, Patchset 22:// Used to get the embedded fastGen from each level struct.

Russ Cox . resolved

You can define just
func (f *fastGen) getFastGen() *fastGen { return f }
and then it will be inherited from the embedding. No need to say it 6 times.
Also move it up near the definition of fastGen.

Klaus Post

Acknowledged

File src/compress/flate/dict_decoder.go

Line 166, Patchset 22 (Parent):

Russ Cox . resolved

Restore this blank line to avoid a spurious diff.

Klaus Post

Acknowledged

File src/compress/flate/example_test.go

Line 96, Patchset 22: zw, err := flate.NewWriterDict(&b, flate.BestCompression, []byte(dict))

Russ Cox . resolved

change back to DefaultCompression (no need to change the example, nor to suggest that people use the expensive one when copying this example)

Klaus Post

Acknowledged

Line 171, Patchset 22: defer wp.Close()

Russ Cox . resolved

Remove. This should not be necessary. If it is necessary, then we have a problem, since it means that flate is reading from the pipe beyond the end of the compressed stream.

Klaus Post

Acknowledged

File src/compress/flate/huffman_bit_writer.go

Line 37, Patchset 22:// Minimum length code that emits bits.

Russ Cox . resolved

// lengthExtraBitsMinCode is ...

Klaus Post

Acknowledged

Line 40, Patchset 22:// The number of extra bits needed by length code X - LENGTH_CODES_START.

Russ Cox . resolved

// lengthExtraBits[i] is the number of extra bits needed by
// length code i - lengthCodesStart.
(I know you didn't write this but let's clean it up.)

Klaus Post

Acknowledged. No worries, might as well take the low-hanging cleanup 😊

Line 48, Patchset 22:// The length indicated by length code X - LENGTH_CODES_START.

Russ Cox . resolved

// lengthBase[i] is the length indicated by length code i - lengthCodesStart.

Klaus Post

Acknowledged

Line 55, Patchset 22:// Minimum offset code that emits bits.

Russ Cox . resolved

// offsetExtraBitsMinCode ...

Klaus Post

Acknowledged

Line 58, Patchset 22:// offset code word extra bits.

Russ Cox . resolved

Fix doc comment

Klaus Post

Acknowledged

Line 69, Patchset 22:func init() {

Russ Cox . unresolved

Avoid init-time work. Define offsetCombined as a static array.
Fine to add a test that has this computation in it to check that it's correct,
if you're worried about that.

Klaus Post

Acknowledged. Added the table, but kept the generation code as a comment, so make the values less "magic". Please check if you find that a reasonable approach.

Line 92, Patchset 22:// The odd order in which the codegen code sizes are written.

Russ Cox . resolved

// codgenOrder is the order in which codegen code sizes are written.

Klaus Post

Acknowledged

Line 95, Patchset 22:type huffmanBitWriter struct {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 106, Patchset 22: lastHuffMan bool

Russ Cox . resolved

A better name would help and certainly a lowercase m.
Maybe 'wroteHuffman'?

Klaus Post

Acknowledged

Line 112, Patchset 22: lastHeader int

Russ Cox . resolved

I suggest making this 'prevHeader', since you are using last to mean previous and not last to mean final.

Klaus Post

Acknowledged

Line 126, Patchset 22:// This is controlled by several variables:

Russ Cox . resolved

Move this discussion into the struct, to document the variables.
Line comments above variable names are fine.

Klaus Post

Acknowledged. Tried to make it as clear as possible.

Line 140, Patchset 22:func newHuffmanBitWriter(w io.Writer) *huffmanBitWriter {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 150, Patchset 22:func (w *huffmanBitWriter) reset(writer io.Writer) {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 157, Patchset 22:func (w *huffmanBitWriter) canReuse(t *tokens) (ok bool) {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 186, Patchset 22:func (w *huffmanBitWriter) flush() {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 214, Patchset 22:func (w *huffmanBitWriter) write(b []byte) {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 221, Patchset 22:func (w *huffmanBitWriter) writeBits(b int32, nb uint8) {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 222, Patchset 22: w.bits |= uint64(b) << (w.nbits & 63)

Russ Cox . resolved

shiftMask?

Klaus Post

Acknowledged

Line 229, Patchset 22:func (w *huffmanBitWriter) writeBytes(bytes []byte) {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 264, Patchset 22: for i := range w.codegenFreq {

Russ Cox . resolved

Use clear; file a bug if performance isn't good enough.

Klaus Post

Acknowledged

Line 344, Patchset 22:func (w *huffmanBitWriter) codegens() int {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 352, Patchset 22:func (w *huffmanBitWriter) headerSize() (size, numCodegens int) {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 381, Patchset 22:// extraBitSize will return the number of bits that will be written

Russ Cox . resolved

s/will return/returns/

Klaus Post

Acknowledged

Line 416, Patchset 22:// Inline manually when performance is critical.

Russ Cox . unresolved

Not wild about this comment. Let's figure out how to avoid this. Perhaps something like
```
type bits struct {
    b uint64
    nb uint
}
func (b bits) writeCode(c hcode, w *huffmanBitWriter) bits {
    b.b |= c.code64() << (b.nb & shiftMask)
    if b.nb += c.len(); b.nb >= 48 {
        w.flushBits(b.b)
        b.b >>= 48
        b.nb -= 48
    }
    return b
}
```
That should inline just fine. It might even work to use a pointer method and drop the return, but I'm not sure.
Then the callers who want to move the bit writing state to the stack can do
```
bits := w.bits
... bits = bits.writeCode(c, w) ...
w.bits = bits
```

Klaus Post

I will do this separately.

Line 425, Patchset 22:// writeOutBits will write bits to the buffer.

Russ Cox . resolved

"write out" is vague. flushBits?

Klaus Post

Acknowledged

Line 494, Patchset 22:// writeStoredHeader will write a stored header.

Russ Cox . resolved

s/will write/writes/

Klaus Post

Acknowledged

Line 544, Patchset 22:// writeBlock will write a block of tokens with the smallest encoding.

Russ Cox . resolved

// writeBlock writes a block of tokens using the smallest encoding.

Klaus Post

Acknowledged

Line 545, Patchset 22:// The original input can be supplied, and if the huffman encoded data

Russ Cox . resolved

s/huffman encoded/Huffman-encoded/

Klaus Post

Acknowledged

Line 614, Patchset 22:// input size the block is stored.

Russ Cox . resolved

s/size/size,/

Klaus Post

I think this comment is outdated. At best it is incomplete, so removing it. The code flow attempts to explain the decisions made anyway.

Line 628, Patchset 22: // We cannot reuse pure huffman table, and must mark as EOF.

Russ Cox . resolved

s/huffman/Huffman/

Klaus Post

Acknowledged

Line 651, Patchset 22: // Check if we should reuse.

Russ Cox . resolved

// Check whether we should reuse the previous Huffman table.

Klaus Post

Acknowledged

Line 753, Patchset 22:// indexTokens indexes a slice of tokens, and updates

Russ Cox . resolved

// indexTokens indexes a slice of tokens, updates literalFreq and offsetFreq,
// and generates literalEncoding and offsetEncoding.
// It returns the number of literal and offset tokens.

Klaus Post

Acknowledged

Line 784, Patchset 22:func (w *huffmanBitWriter) generate() {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 790, Patchset 22:// codes for literal and offset encoding must be supplied.

Russ Cox . resolved

s/codes/Codes/

Klaus Post

Acknowledged

Line 791, Patchset 22:func (w *huffmanBitWriter) writeTokens(tokens []token, leCodes, oeCodes []hcode) {

Russ Cox . resolved

s/leCodes, oeCodes/lenCodes, offCodes/

Klaus Post

Acknowledged

Line 812, Patchset 22: // Go 1.16 LOVES having these on stack.

Russ Cox . unresolved

As noted above, let's find a way to avoid this duplication. We should be able to get this down to
```
bits := w.bits
for _, t := range tokens {
    if t < 256 {
        bits = bits.writeCode(lits[t], w)
        continue
    }
    ...
}
w.bits = bits
```

Klaus Post

I have tried *so* many variations on this piece of code since it is a big proportion of the lower level encoding. If you don't mind, I would rather keep this as something to evaluate later where we can do clean before/after and then decide if the cleanup is worth any potential penalty.

Line 930, Patchset 22:// huffOffset is a static offset encoder used for huffman only encoding.

Russ Cox . resolved

s/huffman only/Huffman-only/

Klaus Post

Acknowledged

Line 932, Patchset 22:var huffOffset *huffmanEncoder

Russ Cox . resolved

```
var huffOffset = sync.OnceValue(func() *huffmanEncoder { 
    w := newHuffmanBitWriter(nil)
    w.offsetFreq[0] = 1
    h := newHuffmanEncoder(offsetCodeCount)
    h.generate(w.offsetFreq[:offsetCodeCount], 15)
    return h
})
```

That will avoid init-time work.

Klaus Post

Acknowledged

Line 942, Patchset 22:// Huffman encoded literals or uncompressed bytes if the

Russ Cox . resolved

Huffman-encoded

Klaus Post

Acknowledged

Line 943, Patchset 22:// results only gains very little from compression.

Russ Cox . resolved

s/only gains/gain/

Klaus Post

Acknowledged. Gain/gains is hard for non-natives 😊

Line 958, Patchset 22: // Add everything as literals

Russ Cox . resolved

// Estimate size of literal encoding.
(delete rest of comment)

Klaus Post

Acknowledged

Line 962, Patchset 22: const guessHeaderSizeBits = 70 * 8

Russ Cox . resolved

const guessHeaderSizeBits = 70 * 8 // 70 bytes; see https://stackoverflow.com/a/25454430

Klaus Post

Acknowledged

Line 966, Patchset 22: // Quick check for incompressible content.

Russ Cox . unresolved

I am confused about what this is doing. It is checking whether
sum_i |f_i - n/256]^2 < 2n
where i ranges over 0..255 and f_i is the frequency of byte i.
But I don't really understand why it makes sense to compare the sum of the squares of the deviations from perfect uniformity with 2n.
Is there some reference to this algorithm that can be added as a comment?

Klaus Post

It is mostly made by experimentation. It lies a few years back, so the exact details escape me.

The main goal was to detect cases where distribution was so flat that Huffman would not be able to compress anything and quickly bail out without generating a table for an exact count.

The best I recall, I experimented by using pure Huffman and simply adjusting the threshold value until various inputs would start showing compression drops, but still catch random data reliably.

Line 973, Patchset 22: if abs > max {

Russ Cox . resolved

Since the test below is abs < max, this should be abs >= max.

Klaus Post

Acknowledged

Line 1004, Patchset 22:

Russ Cox . resolved

Delete odd blank line.

Klaus Post

Acknowledged

Line 1014, Patchset 22: w.literalEncoding, w.tmpLitEncoding = w.tmpLitEncoding, w.literalEncoding

Russ Cox . unresolved

When do these get un-swapped?

Klaus Post

They are swapped, so the next call to writeBlockHuff can use `w.tmpLitEncoding` non-destructively in `w.tmpLitEncoding.generate(w.literalFreq[:numLiterals], 15)`.

This allow us to select to reuse the previous encoding if a new one is deemed to expensive in `if estBits < reuseSize {`.

So this will swap back and forth by itself.

Line 1086, Patchset 22: if w.nbits >= 48 {

Russ Cox . unresolved

How can this be? Doesn't everything that adds to bits check and flush?

Klaus Post

nbits are checked before writing, since various writes have different limits.

File src/compress/flate/huffman_code.go

Line 22, Patchset 22:func (h hcode) len() uint8 {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 26, Patchset 22:func (h hcode) code64() uint64 {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 30, Patchset 22:func (h hcode) zero() bool {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 36, Patchset 22: *h = hcode(length) | (hcode(code) << 8)

Russ Cox . resolved

*h = newhcode(code, length)
or just delete this method entirely and write that where it gets used

Klaus Post

Acknowledged

Line 43, Patchset 22:type huffmanEncoder struct {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 47, Patchset 22: // Allocate a reusable buffer with the longest possible frequency table.

Russ Cox . resolved

// freqcache is a reusable ...

Klaus Post

Acknowledged

Line 53, Patchset 22:func newHuffmanEncoder(size int) *huffmanEncoder {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 59, Patchset 22:type literalNode struct {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 86, Patchset 22:func reverseBits(number uint16, bitLength byte) uint16 {

Russ Cox . resolved

Doc comment. Also rename the arguments to x and b, which will make the comment and code clearer.

Klaus Post

Acknowledged

Line 91, Patchset 22:func generateFixedLiteralEncoding() *huffmanEncoder {

Russ Cox . resolved

// generateFixedLiteralEncoding returns the encoder for the fixed literal table.

Klaus Post

Acknowledged

Line 130, Patchset 22:var fixedLiteralEncoding = generateFixedLiteralEncoding()

Russ Cox . resolved

```
var (
    fixedLiteralEncoding = sync.OnceValue(generateFixedLiteralEncoding)
    fixedOffsetEncoding = sync.OnceValue(generateFixedOffsetEncoding)
)
```

to avoid init time work

Klaus Post

Acknowledged

Line 133, Patchset 22:func (h *huffmanEncoder) bitLength(freq []uint16) int {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 143, Patchset 22:func (h *huffmanEncoder) bitLengthRaw(b []byte) int {

Russ Cox . resolved

Doc comment

Klaus Post

Acknowledged

Line 151, Patchset 22:// canReuseBits returns the number of bits or math.MaxInt32 if the encoder cannot be reused.

Russ Cox . resolved

// canReuseBits returns the number of bits to encode freq.
// It returns math.MaxInt32 if freq cannot be encoded.
At that point I think it should be something like 'canEncodeLen'?

Klaus Post

Acknowledged

Line 166, Patchset 22:// Return the number of literals assigned to each bit size in the Huffman encoding

Russ Cox . resolved

Not sure how it got this way but the actual doc comment starts on line 179 and should be brought up.
// bitCounts returns an integer slice in which slice[i] is the number
// of literals that should be encoded using i bits.
//
// This method is only called when ...

Klaus Post

Acknowledged

Line 206, Patchset 22: // Descending to only have 1 bounds check.

Russ Cox . unresolved

_ = list[2] // check bounds here instead of in loop
(and then put the old code back)

Klaus Post

Will the compiler be able to move the lookup itself out of the loop? When I wrote this it wasn't.

Line 305, Patchset 22:// Look at the leaves and assign them a bit count and an encoding as specified

Russ Cox . resolved

// assignEncodingAndSize assigns bit counts and encodings to the leaves
// as specified in RFC 1951 3.2.2.

Klaus Post

Acknowledged

Line 331, Patchset 22:// Update this Huffman Code object to be the minimum code for the specified frequency count.

Russ Cox . resolved

// generate rewrites h to be the Huffman code for the given frequency count.
// freq[i] is the frequency of literal i, and maxBits is the maximum number
// of bits to use for any literal.

Klaus Post

Acknowledged

Line 372, Patchset 22:// atLeastOne clamps the result between 1 and 15.

Russ Cox . resolved

Rename to 'clamp1to15'.
atLeastOne should be max(v, 1).
Or maybe
```
func clamp(lo, x, hi float32) float32 {
    return min(max(lo, x), hi)
}
```
and then use clamp(1, x, 15) at the call sites.

Klaus Post

Just using min/max where called and removing the helper. It is from before min/max.

Line 380, Patchset 22: histogramSplit(b, h)

Russ Cox . resolved

return after this and drop else

Klaus Post

Acknowledged

Line 390, Patchset 22: // Tested, and slightly faster than 2-way.

Russ Cox . resolved

// Walk four quarters in parallel.
// Tested to be faster than walking halves.

Klaus Post

Acknowledged

Line 399, Patchset 22: x, y, z, w := b[:n], b[n:], b[n+n:], b[n+n+n:]

Russ Cox . unresolved

The compiler is good enough that this should be fine:
```
b0, b1, b2, b3 := b[0*n:][:n], b[1*n:][:n], b[2*n:][:n], b[3*n:][:n]
for i, t := range b0 {
    h0 := &h[t]
    h1 := &h[b1[i]]
    h2 := &h[b2[i]]
    h3 := &h[b3[i]]
    *h0++
    *h1++
    *h2++
    *h3++
}
```
I noticed that you initialized v0,v1,v3,v2 (not v2,v3). If that's important it needs a detailed comment and a benchmark. :-)

Klaus Post

Seeing this regression on amd64 when using above:

EncodeTwainConstant1e6-16 1.19ms ± 1% 1.29ms ± 7% +8.22% (p=0.000 n=10+9)

Changing is order is fine, so doing that.

File src/compress/flate/level1.go

Line 6, Patchset 22:

Russ Cox . resolved

It would help to have some guide here, or at the bottom of deflatefast.go, explaining the algorithm used for each level and why each level is different.
I tried
diff level1.go level2.go
diff level2.go level3.go
diff level3.go level4.go
diff level4.go level5.go
diff level5.go level6.go
and it almost looks like every level is bespoke, without an overarching structure and organization. Of course I am sure that's not the case. It would help a lot to explain that in a comment, and also unnecessary differences in those diffs should be minimized, if you see any when you run those commands.

Klaus Post

Acknowledged. Added this for comment in deflatefast.go:

// fastEncL1 to fastEncL6 provides specialized encoders for levels 1-6
// that each provide a different speed/size/memory strategies.
//
// Level 1: Single small table, 5 byte hashes, sparse indexing.
// Level 2: Single big table, 5 byte hashes, indexing ~ every 2 bytes.
// Level 3: Single medium table, 5 byte hashes, 2 candidates per table entry.
// Level 4: Two tables, 4/7 byte hashes, 1 candidate per table entry.
// Level 5: Two tables, 4/7 byte hashes, 2 candidates per 7-byte table entry.
// Level 6: Two tables, 4/7 byte hashes, full indexing, checks for repeats.
//
// Skipping on contiguous non-matches also decreases as levels go up.

File src/compress/flate/regmask_amd64.go

Line 7, Patchset 22:const (

Russ Cox . resolved

// shiftMask is a no-op shift mask for x86-64.
// Using it lets the compiler omit the check for shift size >= 64.
const shiftMask = 63

Klaus Post

Acknowledged

File src/compress/flate/regmask_other.go

Line 9, Patchset 22:const (

Russ Cox . resolved

// shiftMask is a no-op shift mask for non-x86-64.
// The compiler will optimize it away.
const shiftMask = 0xFF

Klaus Post

Acknowledged

File src/compress/flate/token.go

Line 124, Patchset 22:type token uint32

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 138, Patchset 22:func (t *tokens) Reset() {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 149, Patchset 22:func indexTokens(in []token) tokens {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 155, Patchset 22:func (t *tokens) indexTokens(in []token) {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 175, Patchset 22:func (t *tokens) AddLiteral(lit byte) {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 181, Patchset 22:// from https://stackoverflow.com/a/28730362

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 192, Patchset 22:// EstimatedBits will return a minimum size estimated by an *optimal*

Russ Cox . resolved

// EstimatedBits returns an estimated minimum size for the
// optimal compression of t.

Klaus Post

Acknowledged

Line 267, Patchset 22:func (t *tokens) AddEOB() {

Russ Cox . resolved

Doc comment.

Klaus Post

Acknowledged

Line 278, Patchset 22:// Returns the type of a token

Russ Cox . resolved

// typ returns ...

Klaus Post

Acknowledged

Line 281, Patchset 22:// Returns the literal of a literal token

Russ Cox . resolved

// literal returns the literal value of t.

Klaus Post

Acknowledged

Line 284, Patchset 22:// Returns the extra offset of a match token

Russ Cox . resolved

// offset returns....

Klaus Post

Acknowledged

Line 289, Patchset 22:// Convert length to code.

Russ Cox . resolved

// lengthCode ...

Klaus Post

Acknowledged

Line 292, Patchset 22:// Returns the offset code corresponding to a specific offset

Russ Cox . resolved

// offsetCode ...

Klaus Post

Acknowledged

File src/compress/flate/writer_test.go

Line 48, Patchset 20: b.Run(fmt.Sprint("level-", level), func(b *testing.B) {

Russ Cox . resolved

Please use level=, which will work better with benchstat.

Klaus Post

Acknowledged

Line 150, Patchset 20: t.Run(fmt.Sprintf("level-%d", l), func(t *testing.T) {

Russ Cox . resolved

level=

Klaus Post

Acknowledged

File src/compress/zlib/example_test.go

Line 26, Patchset 20: buff := []byte{120, 156, 202, 72, 205, 201, 201, 215, 81, 40, 207,

Russ Cox . resolved

Please update to match ExampleNewWriter.

Klaus Post

Acknowledged

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Russ Cox
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Mar 5, 2026, 1:54:50 PM (13 days ago) Mar 5

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Russ Cox, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #24 to this change.

Following approvals got outdated and were removed:

TryBots-Pass: LUCI-TryBot-Result-1 by Go LUCI

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Klaus Post

Russ Cox
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Mar 5, 2026, 3:08:35 PM (13 days ago) Mar 5

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Russ Cox, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #25 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo
Klaus Post
Russ Cox
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Mar 5, 2026, 3:12:08 PM (13 days ago) Mar 5

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Russ Cox, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Russ Cox, Sean Liao and t hepudds

Klaus Post added 5 comments

Patchset-level comments

File-level comment, Patchset 25 (Latest):

Klaus Post . resolved

Trying out proposals. Will do the int32-uint32 changes tomorrow or early next week.

File src/compress/flate/deflate.go

Line 408, Patchset 22: for {

Russ Cox . unresolved

Please see if you can find a way to write this more clearly. The indentation is 12 levels deep at one point. A few lines of high-level comment about the strategy would help too.

Klaus Post

Will attempt as separate commit.

Klaus Post

Extracted match-at-end and skipLiteral. Removes 'else'.

Attempted to extract 'insertHash/emitLiteral', but speed penalty was almost 20% due to call cost.

This takes out the worst of the complexity at minimal cost:

λ benchstat before.txt after.txt
name old time/op new time/op delta
Encode/Digits/Compression/1e4-16 109µs ± 5% 110µs ± 2% ~ (p=0.912 n=10+10)
Encode/Digits/Compression/1e5-16 3.26ms ± 1% 3.40ms ± 1% +4.43% (p=0.000 n=10+10)
Encode/Digits/Compression/1e6-16 36.8ms ± 1% 38.1ms ± 1% +3.56% (p=0.000 n=10+10)
Encode/Newton/Compression/1e4-16 118µs ± 6% 114µs ± 1% -3.61% (p=0.002 n=10+10)
Encode/Newton/Compression/1e5-16 4.38ms ± 1% 4.49ms ± 1% +2.49% (p=0.000 n=10+10)
Encode/Newton/Compression/1e6-16 47.3ms ± 1% 48.4ms ± 1% +2.26% (p=0.000 n=10+10)

name old speed new speed delta
Encode/Digits/Compression/1e4-16 91.5MB/s ± 5% 91.0MB/s ± 2% ~ (p=0.912 n=10+10)
Encode/Digits/Compression/1e5-16 30.7MB/s ± 1% 29.4MB/s ± 1% -4.25% (p=0.000 n=10+10)
Encode/Digits/Compression/1e6-16 27.2MB/s ± 1% 26.3MB/s ± 1% -3.43% (p=0.000 n=10+10)
Encode/Newton/Compression/1e4-16 85.0MB/s ± 6% 88.1MB/s ± 1% +3.66% (p=0.002 n=10+10)
Encode/Newton/Compression/1e5-16 22.8MB/s ± 1% 22.3MB/s ± 1% -2.43% (p=0.000 n=10+10)
Encode/Newton/Compression/1e6-16 21.1MB/s ± 1% 20.7MB/s ± 1% -2.21% (p=0.000 n=10+10)

This will only affect levels 7-9, which I don't consider as time sensitive, so I suggest we keep it there.

File src/compress/flate/huffman_bit_writer.go

Line 416, Patchset 22:// Inline manually when performance is critical.

Russ Cox . unresolved

Not wild about this comment. Let's figure out how to avoid this. Perhaps something like
```
type bits struct {
    b uint64
    nb uint
}
func (b bits) writeCode(c hcode, w *huffmanBitWriter) bits {
    b.b |= c.code64() << (b.nb & shiftMask)
    if b.nb += c.len(); b.nb >= 48 {
        w.flushBits(b.b)
        b.b >>= 48
        b.nb -= 48
    }
    return b
}
```
That should inline just fine. It might even work to use a pointer method and drop the return, but I'm not sure.
Then the callers who want to move the bit writing state to the stack can do
```
bits := w.bits
... bits = bits.writeCode(c, w) ...
w.bits = bits
```

Klaus Post

I will do this separately.

Klaus Post

About 50% speed decrease as proposed on Huffman. Even with pointer receivers and nb as uint8 (to avoid the &) it is unable to inline.

Quick benchmarks:

  │     Benchmark      │ Baseline │ Value recv │ Ptr recv │
  │ Digits/Huffman     │ 958 MB/s │ 441 MB/s   │ 386 MB/s │
  │ Digits/Speed       │ 165 MB/s │ 141 MB/s   │ 141 MB/s │
  │ Digits/Default     │ 91 MB/s  │ 82 MB/s    │ 83 MB/s  │
  │ Digits/Compression │ 24 MB/s  │ 24 MB/s    │ 23 MB/s  │
  │ Newton/Huffman     │ 732 MB/s │ 362 MB/s   │ 334 MB/s │
  │ Newton/Speed       │ 222 MB/s │ 197 MB/s   │ 190 MB/s │
  │ Newton/Default     │ 124 MB/s │ 119 MB/s   │ 118 MB/s │
  │ Newton/Compression │ 20 MB/s  │ 20 MB/s    │ 20 MB/s  │
  
Diff with pointer: https://gist.github.com/klauspost/c01014d1194cb5f8c56d47d7f1bbb330

Line 812, Patchset 22: // Go 1.16 LOVES having these on stack.

Russ Cox . unresolved

As noted above, let's find a way to avoid this duplication. We should be able to get this down to
```
bits := w.bits
for _, t := range tokens {
    if t < 256 {
        bits = bits.writeCode(lits[t], w)
        continue
    }
    ...
}
w.bits = bits
```

Klaus Post

I have tried *so* many variations on this piece of code since it is a big proportion of the lower level encoding. If you don't mind, I would rather keep this as something to evaluate later where we can do clean before/after and then decide if the cleanup is worth any potential penalty.

Klaus Post

See above for results without manual inlining.

I understand that you would like to fix the compiler rather than having manually inlined code. I don't really have much say in this... And most of my approach has been to simply deal with the codegen given - and tweak until benchmarks showed improvements.

I checked if doing the vars-on-stack still matters...

Summary at 1e6:

  │     Benchmark      │ Stack locals │ No stack locals │ Delta │
  │ Digits/Huffman     │ 878 MB/s     │ 523 MB/s        │ -40%  │
  │ Digits/Speed       │ 158 MB/s     │ 148 MB/s        │ -6%   │
  │ Digits/Default     │ 86 MB/s      │ 82 MB/s         │ -5%   │
  │ Digits/Compression │ 22 MB/s      │ 22 MB/s         │ ~0%   │
  │ Newton/Huffman     │ 683 MB/s     │ 438 MB/s        │ -36%  │
  │ Newton/Speed       │ 208 MB/s     │ 201 MB/s        │ -3%   │
  │ Newton/Default     │ 116 MB/s     │ 115 MB/s        │ ~0%   │
  │ Newton/Compression │ 19 MB/s      │ 19 MB/s         │ ~0%   │

The stack-local trick matters a lot for Huffman (~38% faster) and noticeably for Speed (~5%). In my opinion the 5% would be ok, but the cleanup isn't that big, so not adding it.

File src/compress/flate/huffman_code.go

Line 206, Patchset 22: // Descending to only have 1 bounds check.

Russ Cox . resolved

_ = list[2] // check bounds here instead of in loop
(and then put the old code back)

Klaus Post

Will the compiler be able to move the lookup itself out of the loop? When I wrote this it wasn't.

Klaus Post

Actually a tiny bit faster either way 😊

λ benchstat before.txt after.txt
name old time/op new time/op delta
Encode/Digits/Speed/1e4-16 29.4µs ± 3% 29.4µs ± 3% ~ (p=0.912 n=10+10)
Encode/Digits/Speed/1e5-16 495µs ± 2% 493µs ± 2% ~ (p=0.497 n=10+9)
Encode/Digits/Speed/1e6-16 6.02ms ± 3% 6.00ms ± 2% ~ (p=0.684 n=10+10)
Encode/Newton/Speed/1e4-16 34.1µs ± 3% 34.4µs ± 3% ~ (p=0.218 n=10+10)
Encode/Newton/Speed/1e5-16 428µs ± 8% 414µs ± 2% -3.32% (p=0.011 n=10+10)
Encode/Newton/Speed/1e6-16 4.69ms ± 3% 4.60ms ± 3% -2.01% (p=0.043 n=10+10)

name old speed new speed delta
Encode/Digits/Speed/1e4-16 340MB/s ± 3% 340MB/s ± 3% ~ (p=0.912 n=10+10)
Encode/Digits/Speed/1e5-16 202MB/s ± 2% 203MB/s ± 2% ~ (p=0.497 n=10+9)
Encode/Digits/Speed/1e6-16 166MB/s ± 3% 167MB/s ± 2% ~ (p=0.684 n=10+10)
Encode/Newton/Speed/1e4-16 293MB/s ± 3% 291MB/s ± 3% ~ (p=0.210 n=10+10)
Encode/Newton/Speed/1e5-16 234MB/s ± 8% 242MB/s ± 2% +3.34% (p=0.011 n=10+10)
Encode/Newton/Speed/1e6-16 213MB/s ± 3% 218MB/s ± 3% +2.05% (p=0.043 n=10+10)

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Russ Cox
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Mar 9, 2026, 10:15:09 AM (9 days ago) Mar 9

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Russ Cox, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #26 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo
Russ Cox
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Mar 9, 2026, 10:18:39 AM (9 days ago) Mar 9

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Russ Cox, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Russ Cox, Sean Liao and t hepudds

Klaus Post added 3 comments

Patchset-level comments

File-level comment, Patchset 26 (Latest):

Klaus Post . resolved

This is borderline, so leaving the decision to you. AFAICT there are no more pending issues for me to address.

File src/compress/flate/deflate.go

Line 75, Patchset 22: length int

Russ Cox . unresolved

It looks like a bunch of conversions would disappear if these were maintained as int32, and similarly for the hash arrays below.

Klaus Post

Yes, should be pretty safe, since it will be the same on 32 bits. I will do it separately to evaluate if we get more/less conversions - since slice length, etc would need converting.

Klaus Post

Added this alongside #85 - it is cleaner, but gives a minor regression for levels 7-9.

λ benchstat before.txt after.txt
name old time/op new time/op delta

Encode/Digits/Compression/1e6-16 37.8ms ± 1% 39.6ms ± 1% +4.56% (p=0.000 n=9+10)
Encode/Newton/Compression/1e6-16 49.8ms ± 3% 49.7ms ± 1% ~ (p=0.529 n=10+10)

name old speed new speed delta

Encode/Digits/Compression/1e6-16 26.4MB/s ± 1% 25.3MB/s ± 1% -4.37% (p=0.000 n=9+10)
Encode/Newton/Compression/1e6-16 20.1MB/s ± 3% 20.1MB/s ± 1% ~ (p=0.541 n=10+10)

Sending in the the change. Let me know if you'd prefer to revert.

Line 85, Patchset 22: hashMatch [maxMatchLength + minMatchLength]uint32

Russ Cox . resolved

Unless the window can be > 2GB, use int32 to avoid weird math around 0.

Klaus Post

Will attempt as separate commit.

Klaus Post

See #75

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo
Russ Cox
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Russ Cox (Gerrit)

unread,

Mar 13, 2026, 1:27:04 PM (5 days ago) Mar 13

to Klaus Post, Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Russ Cox, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Russ Cox added 11 comments

File src/compress/flate/deflate.go

Line 92, Patchset 22: hashHead [hashSize]uint32

Russ Cox . resolved

Same.

Russ Cox

Looks done.

Line 753, Patchset 26 (Latest):// reset the compressor with a new output writer.

Russ Cox . unresolved

reset resets ...

Line 784, Patchset 26 (Latest):// close will flush any uncompressed data and write an EOF block.

Russ Cox . unresolved

s/will flush/flushes/
s/write/writes/

File src/compress/flate/huffman_bit_writer.go

Line 69, Patchset 22:func init() {

Russ Cox . resolved

Avoid init-time work. Define offsetCombined as a static array.
Fine to add a test that has this computation in it to check that it's correct,
if you're worried about that.

Klaus Post

Acknowledged. Added the table, but kept the generation code as a comment, so make the values less "magic". Please check if you find that a reasonable approach.

Russ Cox

Works for me, thanks.

Line 126, Patchset 26 (Latest): // If 'wroteHuffman' is set, a table for outputting only literals

Russ Cox . unresolved

Drop '' around name.

Line 203, Patchset 26 (Latest):// flush the currently encoded data.

Russ Cox . unresolved

flush flushes

Line 233, Patchset 26 (Latest):// write the provided bytes directly to the output,

Russ Cox . unresolved

// write writes ...

Line 966, Patchset 22: // Quick check for incompressible content.

Russ Cox . unresolved

I am confused about what this is doing. It is checking whether
sum_i |f_i - n/256]^2 < 2n
where i ranges over 0..255 and f_i is the frequency of byte i.
But I don't really understand why it makes sense to compare the sum of the squares of the deviations from perfect uniformity with 2n.
Is there some reference to this algorithm that can be added as a comment?

Klaus Post

It is mostly made by experimentation. It lies a few years back, so the exact details escape me.
The main goal was to detect cases where distribution was so flat that Huffman would not be able to compress anything and quickly bail out without generating a table for an exact count.
The best I recall, I experimented by using pure Huffman and simply adjusting the threshold value until various inputs would start showing compression drops, but still catch random data reliably.

Russ Cox

OK fair enough. Thanks. Maybe add a second line to comment

// This is a heuristic that works well enough
// but is not based on any specific algorithm.

Line 1014, Patchset 22: w.literalEncoding, w.tmpLitEncoding = w.tmpLitEncoding, w.literalEncoding

Russ Cox . resolved

When do these get un-swapped?

Klaus Post

They are swapped, so the next call to writeBlockHuff can use `w.tmpLitEncoding` non-destructively in `w.tmpLitEncoding.generate(w.literalFreq[:numLiterals], 15)`.
This allow us to select to reuse the previous encoding if a new one is deemed to expensive in `if estBits < reuseSize {`.
So this will swap back and forth by itself.

Russ Cox

Acknowledged

Line 1086, Patchset 22: if w.nbits >= 48 {

Russ Cox . resolved

How can this be? Doesn't everything that adds to bits check and flush?

Klaus Post

nbits are checked before writing, since various writes have different limits.

Russ Cox

Acknowledged

File src/compress/flate/huffman_code.go

Line 399, Patchset 22: x, y, z, w := b[:n], b[n:], b[n+n:], b[n+n+n:]

Russ Cox . resolved

The compiler is good enough that this should be fine:
```
b0, b1, b2, b3 := b[0*n:][:n], b[1*n:][:n], b[2*n:][:n], b[3*n:][:n]
for i, t := range b0 {
    h0 := &h[t]
    h1 := &h[b1[i]]
    h2 := &h[b2[i]]
    h3 := &h[b3[i]]
    *h0++
    *h1++
    *h2++
    *h3++
}
```
I noticed that you initialized v0,v1,v3,v2 (not v2,v3). If that's important it needs a detailed comment and a benchmark. :-)

Klaus Post

Seeing this regression on amd64 when using above:
EncodeTwainConstant1e6-16 1.19ms ± 1% 1.29ms ± 7% +8.22% (p=0.000 n=10+9)
Changing is order is fine, so doing that.

Russ Cox

Acknowledged

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Klaus Post
Sean Liao
t hepudds

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

Mar 16, 2026, 12:53:36 PM (2 days ago) Mar 16

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #27 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo
Klaus Post
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Klaus Post (Gerrit)

unread,

Mar 16, 2026, 12:55:04 PM (2 days ago) Mar 16

to Gerrit Bot, goph...@pubsubhelper.golang.org, Go LUCI, Russ Cox, Marek Siarkowicz, Cherry Mui, Ian Lance Taylor, t hepudds, Carlos Amedee, Jorropo, Gopher Robot, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Russ Cox, Sean Liao and t hepudds

Klaus Post voted and added 6 comments

Votes added by Klaus Post

Commit-Queue

+1

6 comments

File src/compress/flate/deflate.go

Line 753, Patchset 26:// reset the compressor with a new output writer.

Russ Cox . resolved

reset resets ...

Klaus Post

Acknowledged

Line 784, Patchset 26:// close will flush any uncompressed data and write an EOF block.

Russ Cox . resolved

s/will flush/flushes/
s/write/writes/

Klaus Post

Acknowledged

File src/compress/flate/huffman_bit_writer.go

Line 126, Patchset 26: // If 'wroteHuffman' is set, a table for outputting only literals

Russ Cox . resolved

Drop '' around name.

Klaus Post

Acknowledged

Line 203, Patchset 26:// flush the currently encoded data.

Russ Cox . resolved

flush flushes

Klaus Post

Acknowledged

Line 233, Patchset 26:// write the provided bytes directly to the output,

Russ Cox . resolved

// write writes ...

Klaus Post

Acknowledged

Line 966, Patchset 22: // Quick check for incompressible content.

Russ Cox . resolved

I am confused about what this is doing. It is checking whether
sum_i |f_i - n/256]^2 < 2n
where i ranges over 0..255 and f_i is the frequency of byte i.
But I don't really understand why it makes sense to compare the sum of the squares of the deviations from perfect uniformity with 2n.
Is there some reference to this algorithm that can be added as a comment?

Klaus Post

It is mostly made by experimentation. It lies a few years back, so the exact details escape me.
The main goal was to detect cases where distribution was so flat that Huffman would not be able to compress anything and quickly bail out without generating a table for an exact count.
The best I recall, I experimented by using pure Huffman and simply adjusting the threshold value until various inputs would start showing compression drops, but still catch random data reliably.

Russ Cox

OK fair enough. Thanks. Maybe add a second line to comment
// This is a heuristic that works well enough
// but is not based on any specific algorithm.

Klaus Post

Acknowledged. Attempted to explain the intent.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Russ Cox
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement
TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

open

diffy

Gerrit Bot (Gerrit)

unread,

5:20 AM (17 hours ago) 5:20 AM

to Klaus Post, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

Attention needed from Carlos Amedee, Cherry Mui, Ian Lance Taylor, Jorropo, Klaus Post, Russ Cox, Sean Liao and t hepudds

Gerrit Bot uploaded new patchset

Gerrit Bot uploaded patch set #28 to this change.

Open in Gerrit

Related details

Attention is currently required from:

Carlos Amedee
Cherry Mui
Ian Lance Taylor
Jorropo

Klaus Post

Russ Cox
Sean Liao
t hepudds

Submit Requirements:

Code-Review
No-Unresolved-Comments
Review-Enforcement

TryBots-Pass

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

unsatisfied_requirement

satisfied_requirement

open

diffy

Reply all

Reply to author

Forward

[go] compress/flate: improve compression speed

Gerrit Bot (Gerrit)

Gerrit Bot has uploaded the change for review

Commit message

Change diff

Change information

Related details

Gopher Robot (Gerrit)

Gopher Robot added 1 comment

Related details

Jorropo (Gerrit)

Jorropo voted Commit-Queue+1

Related details

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

Jorropo (Gerrit)

Jorropo added 2 comments

Related details

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

Jes Cok (Gerrit)

Jes Cok voted Commit-Queue+1

Related details

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

Jes Cok (Gerrit)

Jes Cok voted Commit-Queue+1

Related details

Klaus Post (Gerrit)

Klaus Post added 3 comments

Related details

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

Sean Liao (Gerrit)

Sean Liao added 22 comments

Related details

Carlos Amedee (Gerrit)

Carlos Amedee voted Commit-Queue+1

Related details

Sean Liao (Gerrit)

Sean Liao added 7 comments

Related details

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

Klaus Post (Gerrit)

Klaus Post added 27 comments

Related details

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

t hepudds (Gerrit)

t hepudds added 1 comment

Related details

Klaus Post (Gerrit)

Klaus Post added 1 comment

Sean Liao (Gerrit)

Sean Liao added 3 comments

Related details

Gerrit Bot (Gerrit)

Gerrit Bot uploaded new patchset

Related details

Klaus Post (Gerrit)

Klaus Post added 2 comments

Related details

t hepudds (Gerrit)

t hepudds added 1 comment

t hepudds (Gerrit)

t hepudds added 1 comment

Related details

t hepudds (Gerrit)

t hepudds voted Commit-Queue+1

Klaus Post (Gerrit)