compress/flate "best speed" algorithm

Nigel Tao

unread,

Oct 14, 2016, 4:27:42 AM10/14/16

to golang-dev, Klaus Post, Joe Tsai, Russ Cox

The level 1 (fastest speed, not the best output size) algorithm used
by the stdlib compress/flate package changed in Go 1.7 to something
snappy based.

One feature (not part of snappy) in Klaus Post's upstream flate
version, enabling back-matches across blocks, wins back some output
size losses. Unfortunately, a correctness bug was found just before
the feature freeze, so that feature didn't make it for 1.7.

I'd like to land it for 1.8.

Are there any other compress/flate changes to do before the 1.8 feature freeze?

Context:
https://go-review.googlesource.com/#/c/21021/
https://go-review.googlesource.com/#/c/22369/
https://go-review.googlesource.com/#/c/22370/
https://groups.google.com/forum/#!topic/golang-dev/XYgHX9p8IOk

klau...@gmail.com

unread,

Oct 14, 2016, 5:24:04 AM10/14/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

On Friday, 14 October 2016 10:27:42 UTC+2, Nigel Tao wrote:

One feature (not part of snappy) in Klaus Post's upstream flate
version, enabling back-matches across blocks, wins back some output
size losses. Unfortunately, a correctness bug was found just before
the feature freeze, so that feature didn't make it for 1.7.

I have started writing it, planning it to be "level 2". It doesn't do actual cross-block matching yet, but it should now retain the data between calls, so it will be possible.

As an experiment I also store the 4 actual bytes as a uint32, so it is possible to check if we actually matched without having to look up in the history buffer.

WIP: https://github.com/klauspost/compress/pull/61

I hope to do some more work on this during the weekend. Maybe then we can tell if it should replace level 1 or be a level 2 mode.

I'd like to land it for 1.8.

Are there any other compress/flate changes to do before the 1.8 feature freeze?

I am pretty sure I've found a minor bug, but I haven't had the time to send it in yet.

In https://github.com/golang/go/blob/master/src/compress/flate/huffman_bit_writer.go#L523

We use "huffOffset", which is used when there are no offset tokens. This should be w.offsetEncoding instead. The result is slightly wrong size estimation, which can lead to suboptimal choice of entropy encoding. I will send in a CL with this fix.

A third issue I have been looking at is retaining dynamic encoding tables across multiple blocks. I have done a (hacky) experiment. Initial numbers show ~2% compression loss and around 20% speed increase - of course varying a lot between data types. This is not something for Go 1.8.

WIP: https://github.com/klauspost/compress/pull/60

Finally, something I made for fun was a "compressibility estimater" which estimates predictability and entropy is a block of data. To my surprise it actually runs at ~800MB on my machine, and it is ok looking at the estimates it gives: https://play.golang.org/p/TtTM9EgV-D - it could be interesting to use this route to dynamically select compression options. Also not Go 1.8 material.

/Klaus

klau...@gmail.com

unread,

Oct 15, 2016, 8:27:41 AM10/15/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

Hi!

Sent in https://go-review.googlesource.com/#/c/31172/

Another minor issue I encountered is that the "determinism" test doesn't actually test determinism since io.CopyBuffer uses the io.WriterTo interface, and level 0 is not deterministic (and has probably never been).

I will send a CL for that.

/Klaus

klau...@gmail.com

unread,

Oct 16, 2016, 1:56:52 PM10/16/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

Hi

I have finished two new versions of the deflater, based on the current version.

They are implemented as "level 2" and "level 3" here:

https://github.com/klauspost/compress/blob/3ea05ac91a6ac78ed594ffbf78db967cf60dde05/flate/snappy.go#L242

Level 1 is the same as go tip, and used here as a reference.

Level 2 does compression across block boundaries - it does this by copying the previous buffer and have offsets continually increase. It also stores the 4 source bytes along with the offset in the hash table - this way we don't have to look up the value in history to check if it matches.

Level 3 can check the previous match, if the actual bytes do not match up after the hash lookup. Speed impact is small, but so is the improvement.

AMD64, 1 core used.

"Level", "Gzipped size", "Throughput".

Web content (sites, 481601400 bytes):

1, 165556800 bytes, 71.96MB/s

2, 163167700 bytes, 70.18MB/s

3, 162457100 bytes, 66.70MB/s

enwik9 (1000000000 bytes):

1, 391052014 bytes, 78.36 MB/s

2, 382585541 bytes, 73.99 MB/s

3, 379124969 bytes, 69.29 MB/s

Highly compressible JSON (adresser.001, 1073741824 bytes):

1, 58161197 bytes, 288.94 MB/s

2, 48010017 bytes, 301.03 MB/s

3, 44563592 bytes, 357.03 MB/s

VM Image (rawstudio-mint14.tar, 8558382592 bytes):

1, 4030083076, 103.25 MB/s

2, 3956375446, 97.78 MB/s

3, 3943454373, 95.96 MB/s

I tried changing level 3, so it would always check the previous match and see if it was better. The speed impact was so big that it didn't make sense as a candidate here.

Let me know if you think we should move forward with this.

/Klaus

Nigel Tao

unread,

Oct 17, 2016, 5:29:15 AM10/17/16

to Klaus Post, golang-dev, Joe Tsai, Russ Cox

On Mon, Oct 17, 2016 at 4:56 AM, <klau...@gmail.com> wrote:
> Let me know if you think we should move forward with this.

I would like to get compression across block boundaries in. The size
improvement for adresser.001 is compelling.

I'd still rather err on the side of less code for now, and have only
one compress/flate level correspond to a Snappy-based algorithm. I
don't have data, but while I can easily imagine programmers explicitly
wanting "Best Speed" instead of the default output size vs throughput
trade-off, I'm skeptical that that many programmers explicitly want
"Slightly Less than Best Speed".

In other words, I'd like to make what klauspost/compress calls L2 to
become what compress/flate calls level 1 (aka BestSpeed), and leave
all other compress/flate levels (i.e. 2 and above) unchanged.

klau...@gmail.com

unread,

Oct 17, 2016, 5:39:08 AM10/17/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

Hi!

> I'd still rather err on the side of less code for now, and have only
>one compress/flate level correspond to a Snappy-based algorithm.

Agree - the speed difference is very small and it has some good upsides.

I will clean up the code a little and submit it as a CL, probably next weekend.

If you have noted anything else, just let me know, so I can look at it before

submitting.

/Klaus

Joe Tsai

unread,

Oct 17, 2016, 4:20:01 PM10/17/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

I agree with Nigel that I would rather see the back-matches across blocks to be applied to BestSpeed, rather than have another special case on level.

To get a sense of the usage of various compression levels, I sampled what levels were used across a number of code bases (sample size of 350 references to NewWriterX):

10.5% Level is variable (API exposed)

0.5% Level0 - NoCompression

4.5% Level1 - BestSpeed

1.6% Level3

3.4% Level5

69.0% Level6 - DefaultCompression

10.3% Level9 - BestCompression

This is a really rough estimate based on number of references to NewWriter rather than the actual call distribution. It seems that there is little to no benefit to using optimizing Level2 since nobody uses it. There are some very small usages of Level3.

If I understand correctly, the freeze is in 2 weeks, which IMO is cutting it close for some non-trivial amount of work. @klauspost, what is your estimation for the amount of change required? If we want to commit to the changes proposed, I would rather see most of it land in Go 1.8 or none at all.

Similar to sort.Sort, people do depend on the stability of the output of compress/flate (since it is the backbone of .png, .zip, .gz, .tar.gz, etc...). Although the compatibility agreement doesn't guarantee stability of the DEFLATE output, the reality is that people still assume that it holds true. When Go 1.7 was released, a significant number of failures were due to faulty assumptions on DEFLATE output stability. I think we should still be able to update and optimize DEFLATE and thus break people on these faulty assumption, but we should only do it when the benefits are clear. Thus, if the bulk of the work won't be able to land in Go1.8, then I propose that we make all minor and major changes to the compressor in Go1.9. If that is case, I also propose temporarily reverting golang.org/cl/31172, which does alter the exact output.

JT

klau...@gmail.com

unread,

Oct 20, 2016, 4:55:54 AM10/20/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

On Monday, 17 October 2016 22:20:01 UTC+2, Joe Tsai wrote:

To get a sense of the usage of various compression levels, I sampled what levels were used across a number of code bases (sample size of 350 references to NewWriterX):
10.5% Level is variable (API exposed)
0.5% Level0 - NoCompression
4.5% Level1 - BestSpeed
1.6% Level3
3.4% Level5
69.0% Level6 - DefaultCompression
10.3% Level9 - BestCompression
This is a really rough estimate based on number of references to NewWriter rather than the actual call distribution. It seems that there is little to no benefit to using optimizing Level2 since nobody uses it. There are some very small usages of Level3.

Great stats - even if it doesn't tell us the usage of each, then it is a good indication. Nigels and my proposal is in line with that, and replaces level 1.

I assume that the wast majority of "Level 6" is using the "DefaultCompression" constant, which is good because it gives us future flexibility. It currently has a bad tradeoff between speed and compression, and is something we should look at in the future.

If I understand correctly, the freeze is in 2 weeks, which IMO is cutting it close for some non-trivial amount of work. @klauspost, what is your estimation for the amount of change required? If we want to commit to the changes proposed, I would rather see most of it land in Go 1.8 or none at all.

I don't expect any other than cosmetic changes compared to the encoder here https://github.com/klauspost/compress/blob/ef8de959e34e87f0f610dedfbd9ca6aff3b8612b/flate/snappy.go#L242 - I have run it through several gigabytes of test data, and will probably do a bit more testing before merging it, though the base seems solid.

Other than that there is only the matter of having a structure to store the data between calls since "encodeBestSpeed" doesn't retain any state.

Similar to sort.Sort, people do depend on the stability of the output of compress/flate (since it is the backbone of .png, .zip, .gz, .tar.gz, etc...).

Indeed, there should be no changes unless there are actual improvements. However, this seems to give a solid improvement at quite low speed impact. If users assume deflate output to be stable across versions they are making a wrong assumption to begin with and would basically forbid us to make improvements to any of the packages you mention.

Thanks for the comments on the current CL's - looks good - I will make the changes during the weekend. I couldn't add you as reviewer in Gerrit for some reason.

/Klaus

klau...@gmail.com

unread,

Oct 22, 2016, 2:36:49 PM10/22/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

Hi

Joe - I have submitted https://go-review.googlesource.com/#/c/31640/ "compress/flate: level 1 (best speed) match across blocks"

If you (and Nigel, who I already added) have the time, please give it a look.

It is IMHO quite clean, and I tried to keep changes conservative and not add/change too much - not that I ended up wanting to anyway.

/Klaus

klau...@gmail.com

unread,

Oct 26, 2016, 7:53:15 AM10/26/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

Hi!

Just noticed a weird case, where cross-block matching *hurts* compression.

The test set is an 14651071758 bytes alphabetically sorted list of passwords, so the data set is quite unique.

no cross block references: 4780062535 bytes

with cross block backreferences: 4811718334 bytes

"klauspost-paranoid-passwords.xz" at https://files.klauspost.com/compress/

The difference is so small (0.2% increase), that it is no cause for alarm, but it was still a bit surprising, since I didn't expect this change to have any real downsides, but it may be an accumulation of different offset encoding, since we don't index the last 15 bytes of a block.

Just wanted to throw that in here, so it isn't any surprise that there are very rare cases where it has a negative impact.

Nigel Tao

unread,

Oct 28, 2016, 10:30:52 PM10/28/16

to golang-dev, Klaus Post, Joe Tsai, Russ Cox

On Fri, Oct 14, 2016 at 7:27 PM, Nigel Tao <nige...@golang.org> wrote:
> Are there any other compress/flate changes to do before the 1.8 feature freeze?

Klaus, Joe, any outstanding changes? AFAIK there is only
https://go-review.googlesource.com/#/c/32149/ "tighten the BestSpeed
max match offset bound".

Joe Tsai

unread,

Oct 29, 2016, 5:26:19 AM10/29/16

to golang-dev, klau...@gmail.com, joe...@digital-static.net, r...@golang.org

After your CL, that's all of the upcoming changes to flate that I'm aware of. Awesome job Klaus and Nigel. I'm glad we're able to get these all in for Go 1.8.

JT

Reply all

Reply to author

Forward