The compromise makes a lot of sense to me. I had to use the `cgzip` bindings in many projects, adding complexity to builds and preventing the use of many tools like gorename, so that I could get a faster "BestSpeed" story.
Nice speedups indeed. Does this require any assembly code?
What I am proposing in the CL is to replace the levels 1 to 3 with specialized versions, that are significantly faster. Usually throughput is around 1.7x of the standard encoder. This does comes at a compression cost; usually the reduction loss is between 2 and 3 percent compared to the original level 1,2 and 3.
I'm OK with this in principle, with a few remarks below.
Re the overall pictures at http://imgur.com/a/5sWM4 and at
https://blog.klauspost.com/rebalancing-deflate-compression-levels/
it's not obvious to me whether the graphs are showing the old or new
algorithms for levels 1, 2 and 3. In fact, I think the best graph
would show both.
Flate is normally Huffman + LZ77. In
https://go-review.googlesource.com/#/c/20982/ you are also proposing a
Huffman-only compression level. How does that relate (if at all) to
this proposal?
Should we also have a LZ77-only compression level?
Should we have more than one such level? Conversely, when introducing
Snappy style matching, should we replace levels 1, 2 and 3, or just
level 1 aka BestSpeed?
Please don't do that. We have had problems with this before:
Also, that CL has three separate copy-pasted-and-specialized methods
for levels 1, 2 and 3. Yes, each method has a sentence or two in
comment, but it would be easier to understand the difference between
the three actual algorithms if there was just a single method, with
some "if e.level > 2" style blocks. Sure, copy-paste-and-specialize
can increase performance (at a cost of having more code), but it can
easily be a follow-up CL.
In any case, I will repeat myself in case there was a
mis-communication, but even if there is or was or both being a big gap
between level 3 and level 4, it would be great to see both the old and
new algorithms for levels 1-3 on the graph.
They may offer good progression (again, seeing both the before and the
after for levels 1-3 on the graphs would help inform the discussion),
but in practice, do people use anything other than BestSpeed,
BestCompression, NoCompression or DefaultCompression? I don't know the
answer, but if not, it might not be worth having three copy/pastes of
the new algorithm if nobody uses levels 2 or 3.
Even if you are now resetting every 64KB, that still means that the "s
+= 1 + ((s - lit) >> 4)" skipping can be very aggressive within each
64KB block. Specifically, it is exponential instead of quadratic (see,
for example, the table after "It prints" on the golang/snappy commit
linked to immediately above), which I think is too aggressive.
I know that the compression experts that made google/snappy test
algorithmic tweaks on large corpora of test data, whether samples of a
web crawl or of RPC traffic, although it's clearly not practical to
check in 100s of megabytes of test data into the google/snappy git
repository. If you can convince the C++ google/snappy maintainers to
change their skipping algorithm to yours, then I'm happy to adopt your
skipping algorithm. But until then, I don't think I've seen enough
evidence to move off the safe choice.
compressible or all incompressible. Three isn't a large sample size,
and it doesn't cover things like PDF files which, IIUC, is like a
textual format (i.e. compressible) that can embed binary resources
such as JPEG images (i.e. incompressible) all in the one file.
Finally, I hope that I haven't come across as being too negative in
the various flate discussions. I have made a number of criticisms, but
I aspire to be constructive. Please let me know if you are not finding
that to be the case. I do appreciate the work you've done, and I do
think that it will significantly improve Go 1.7 and beyond.
I'd prefer only replacing level 1, and leaving 2 and 3 alone,
especially since, AFAICT,
https://go-review.googlesource.com/#/c/21021/4/src/compress/flate/deflatefast.go
still has a lot of copy/paste between encodeL1, encodeL2 and encodeL3.
> https://groups.google.com/forum/#!topic/snappy-compression/3Qa3fASLkNA
FWIW, my reading of that discussion thread is that Steinar Gunderson
(one of the C++ snappy authors) reckons that while it's much better
throughput on binary data and slightly worse on textual data, the "1 +
((s - lit) >> 5)" skipping algorithm is a net negative for the C++
implementation, so I'm less inclined to adopt it for the Go
implementation, whether in golang/snappy or in compress/flate.
Just for the sake of history records, the original idea comes from the LZO compressor, that I ported to Go a few months ago:Klaus spotted it while doing benchmarks between LZO and Snappy, extracted it, and adapted it, so all credits to him anyway. I just love the fact that this little, uncommented compression trick went through several packages now, to be fully re-analyzed some 20 years after it was first conceived :)