I have finished two new versions of the deflater, based on the current version.
They are implemented as "level 2" and "level 3" here:
Level 1 is the same as go tip, and used here as a reference.
Level 2 does compression across block boundaries - it does this by copying the previous buffer and have offsets continually increase. It also stores the 4 source bytes along with the offset in the hash table - this way we don't have to look up the value in history to check if it matches.
Level 3 can check the previous match, if the actual bytes do not match up after the hash lookup. Speed impact is small, but so is the improvement.
AMD64, 1 core used.
"Level", "Gzipped size", "Throughput".
Web content (sites, 481601400 bytes):
1, 165556800 bytes, 71.96MB/s
2, 163167700 bytes, 70.18MB/s
3, 162457100 bytes, 66.70MB/s
enwik9 (1000000000 bytes):
1, 391052014 bytes, 78.36 MB/s
2, 382585541 bytes, 73.99 MB/s
3, 379124969 bytes, 69.29 MB/s
Highly compressible JSON (adresser.001, 1073741824 bytes):
1, 58161197 bytes, 288.94 MB/s
2, 48010017 bytes, 301.03 MB/s
3, 44563592 bytes, 357.03 MB/s
I tried changing level 3, so it would always check the previous match and see if it was better. The speed impact was so big that it didn't make sense as a candidate here.
Let me know if you think we should move forward with this.