Brotli

1,068 views
Skip to first unread message

Marco Peereboom

unread,
Oct 9, 2015, 10:38:27 AM10/9/15
to golang-nuts
Anyone know of a golang implementation of brotli?

https://github.com/google/brotli/

Klaus Post

unread,
Oct 10, 2015, 5:23:46 AM10/10/15
to golang-nuts, ma...@peereboom.us
Hi!

On Friday, 9 October 2015 16:38:27 UTC+2, Marco Peereboom wrote:
Anyone know of a golang implementation of brotli? 

Considering it has just been released, I wouldn't expect so. Maybe someone has wrapped the C-code.

I might do a port at some point, but I will wait a little for the C code to stabilize.

It seems like an ok alternative to gzip for small sizes, but it is not like it blows it out of the water. If you are able to pre-compress the content it seems quite good, since the best compression modes are too slow to be considered for real-time compression at the moment.

It is a pity they already defined the dictionary. Looking at it, it seems like it is just slapped out there as "good enough". If someone spent a month collecting a lot of data (ie monitor net traffic), and analyzed it, I think that compression for HTTP could have been a few percent better. This seems like a lot of concatenated strings with a little html and js in there. If it was machine generated most word would probably be separated by spaces for longer matches. Seems like a missed opportunity.

I dumped a text version of the dictionary - 80 character wrapped here: https://gist.github.com/klauspost/2900d5ba6f9b65d69c8e



/Klaus

Axel Wagner

unread,
Oct 10, 2015, 6:10:44 AM10/10/15
to Klaus Post, golang-nuts, ma...@peereboom.us
Klaus Post <klau...@gmail.com> writes:
> If someone spent a
> month collecting a lot of data (ie monitor net traffic), and analyzed it, I
> think that compression for HTTP could have been a few percent better.

Isn't this by Google? Don't you think they did that? And are able to
collect a far more representative sample and far more data than most
people or organisations?

Klaus Post

unread,
Oct 10, 2015, 7:41:14 AM10/10/15
to golang-nuts, klau...@gmail.com, ma...@peereboom.us
On Saturday, 10 October 2015 12:10:44 UTC+2, Axel Wagner wrote:
Isn't this by Google? Don't you think they did that? And are able to
collect a far more representative sample and far more data than most
people or organisations?

Maybe they have, but not having openness about the process always opens for questions.

/Klaus

Marco Peereboom

unread,
Oct 10, 2015, 9:27:38 AM10/10/15
to Klaus Post, golang-nuts


On Sat, Oct 10, 2015 at 02:23:45AM -0700, Klaus Post wrote:
> Hi!
> On Friday, 9 October 2015 16:38:27 UTC+2, Marco Peereboom wrote:
>
> Anyone know of a golang implementation of brotli?A
>
> Considering it has just been released, I wouldn't expect so. Maybe someone
> has wrapped the C-code.
> I might do a port at some point, but I will wait a little for the C code
> to stabilize.
> It seems like an ok alternative to gzip for small sizes, but it is not
> like it blows it out of the water. If you are able to pre-compress the
> content it seems quite good, since the best compression modes are too slow
> to be considered for real-time compression at the moment.
> It is a pity they already defined the dictionary. Looking at it, it seems
> like it is just slapped out there as "good enough". If someone spent a
> month collecting a lot of data (ie monitor net traffic), and analyzed it,
> I think that compression for HTTP could have been a few percent better.
> This seems like a lot of concatenated strings with a little html and js in
> there. If it was machine generated most word would probably be separated
> by spaces for longer matches. Seems like a missed opportunity.
> I dumped a text version of the dictionary - 80 character wrapped
> here:A https://gist.github.com/klauspost/2900d5ba6f9b65d69c8e
> /Klaus

I asked because all other languages have an implementation ;)

Reason I actually am interested is because it sounded like LZMA
compression at LZO speed from the blog. I am actively working on some
code that needs compression and speed hence it caught my eye.

Glad to hear you are somewhat excited about it. Love your zip stuff.

Klaus Post

unread,
Oct 10, 2015, 10:32:06 AM10/10/15
to golang-nuts, klau...@gmail.com, ma...@peereboom.us
On Saturday, 10 October 2015 15:27:38 UTC+2, Marco Peereboom wrote:

I asked because all other languages have an implementation ;)

I think "all the other languages" are merely wrapping the C code.


Reason I actually am interested is because it sounded like LZMA
compression at LZO speed from the blog.  I am actively working on some
code that needs compression and speed hence it caught my eye.

Yes - it sounds too good to be true - and it is for general data. brotli is at least an order of magnitude slower than LZO.

I found this nice interactive benchmark: https://quixdb.github.io/squash-benchmark/

In general, brotli is an ok replacement for gzip for text content. Compression time is about the same as deflate, but with an improved compression ratio.

The brotli level 10+11 compresses well, but is *very* slow. But for static web content, that is exactly what you want. However, for most types of content there are better or faster compressors.

What type of content are you looking to compress?



Glad to hear you are somewhat excited about it.  Love your zip stuff.

I am really glad the we will have something that is better than gzip for web content when it is implemented in browsers. 

/Klaus

Michael Jones

unread,
Oct 10, 2015, 10:56:50 AM10/10/15
to Klaus Post, golang-nuts, ma...@peereboom.us

you...@z505.com

unread,
Oct 11, 2015, 4:14:45 AM10/11/15
to golang-nuts, klau...@gmail.com, ma...@peereboom.us

Well, if your competitors have all the source code to every one of your secrets, you can't exactly run a corporation, that makes a profit. If any search engine out there had access to all googles' source code, well, then.... pretty much google doesn't have any way of making money. Open software is great, except in the real world there is money involved, and keeping some secrets keeps you ahead of competitors. In fact I'm surprised at how much code Google has released, it's like a FOR PROFIT corporation that is also a charity, but no one ever gives credit to google being a charity since all the big brother is watching you slurs make it sound like an evil corp, which overall it is not.

Regards,
Z505
GNG is Not GNU
Critic of free software utopian ideological nuts
But contribute to open software myself.

Damian Gryski

unread,
Oct 11, 2015, 4:54:26 AM10/11/15
to golang-nuts
The decoder is pure C and looks much simpler to translate than the compressor.

Damian

Klaus Post

unread,
Oct 11, 2015, 9:52:36 AM10/11/15
to golang-nuts
On Sunday, 11 October 2015 10:54:26 UTC+2, Damian Gryski wrote:
The decoder is pure C and looks much simpler to translate than the compressor.

Yes, but it is still 2k LOC, so it's not done in a day or two ;)

A compressor is really the most useful for Go, since it is aimed at web servers. Of course you could deploy behind a reverse proxy and let it handle it. It will take some time for browser support to trickle out to end users, so luckily we have a bit of time.

Also, with a bit of time, we can also get an impression of how "safe" it is to use without SSL. Previous experience from trying other compression formats show problems could occur with proxies that can only handle deflate/gzip on plain HTTP.
 

Damian


/Klaus 

thebroke...@gmail.com

unread,
Oct 12, 2015, 4:36:32 AM10/12/15
to golang-nuts
Reason I actually am interested is because it sounded like LZMA compression at LZO speed from the blog.  I am actively working on some code that needs compression and speed hence it caught my eye. 
One innovation that Brotli uses is a static dictionary of 122k generated based on web content of various languages and formats (like CSS, JS, etc). This does, however, mean that Brotli is not technically a "general purpose" algorithm since it assumes that the data it is compressing is largely text-based. It's a wonderful idea for web content, but you should try building the open source program yourself and benchmarking the speeds and compression ratio for your datasets. My own tests (compressing the source code of Linux kernel) have found that the compression ratio is better than DEFLATE and LZMA at a similar compression speed. However, for maximum compressibility, LZMA still beats Brotli. Even if your dataset cant take advantage of the static dictionary, it may still perform better (ratio wise) than DEFLATE since Brotli allows for sliding windows larger than 32KiB (Brotli allows for windows up to 16MiB in latest RFC draft).

I found this nice interactive benchmark: https://quixdb.github.io/squash-benchmark/
In general, brotli is an ok replacement for gzip for text content. Compression time is about the same as deflate, but with an improved compression ratio.
Interesting suite of benchmarks. I'm actually surprised to see that DEFLATE (or zlib) is pretty close to the Pareto frontier since I thought I had read in the past somewhere that DEFLATE was now far from it. Personally, I have always felt that DEFLATE did strike a surprisingly good balance between speed and ratio for generic input datasets for a format designed in the early 1990s.

Some random thoughts about using Brotli:
  • It may or may not be worth implementing in pure Go just yet since the RFC is still a draft and things are still changing. A CGo/SWIG wrapper may be the better idea for the time being.
  • I am a little concerned about the use of a static dictionary. Languages change over time and formats die and are born, which may reduce the effectiveness of this dictionary.
  • The 122Ki static dictionary obviously needs to be compiled into the binary. This may be expensive for embedded systems.
  • (not-big-deal) The current Go compiler is currently relatively inefficient about compiling large static byte slices. A Go file with just the dictionary alone takes 1s to compile on my machine.

Klaus Post

unread,
Oct 12, 2015, 5:30:42 AM10/12/15
to golang-nuts, thebroke...@gmail.com
On Monday, 12 October 2015 10:36:32 UTC+2, thebroke...@gmail.com wrote:
Interesting suite of benchmarks. I'm actually surprised to see that DEFLATE (or zlib) is pretty close to the Pareto frontier since I thought I had read in the past somewhere that DEFLATE was now far from it. Personally, I have always felt that DEFLATE did strike a surprisingly good balance between speed and ratio for generic input datasets for a format designed in the early 1990s.

Yes - it seems to keep relevant, since it has always had a good tradeoff between speed and acceptable compression. It was never the absolute best of either, but the flexibility and ability to scale has suited it well.
 

Some random thoughts about using Brotli:
  • It may or may not be worth implementing in pure Go just yet since the RFC is still a draft and things are still changing. A CGo/SWIG wrapper may be the better idea for the time being.

Agree, even though that will keep it from production use for some people, it will give a good base for testing. 

  • I am a little concerned about the use of a static dictionary. Languages change over time and formats die and are born, which may reduce the effectiveness of this dictionary.
Well, it is kind of the "it doesn't hurt" things. If deflate had an initial dictionary, it would improve compression for some cases, but it would never hurt compression. Kind of the same thing with brotli.
 
  • The 122Ki static dictionary obviously needs to be compiled into the binary. This may be expensive for embedded systems.
I guess, if it is a package then you could choose to not include it on some builds via build tags.

There is of course also all the transformed dictionaries, which will eat up some megabytes of RAM along with the hash table for all entries. The C-implementation has (some sort of) hashes of all transformed entries, as well as all valid transformation words as static tables. I guess it is a question of trading 0.5s startup time with 0.5s compile time and a bigger binary. I guess tests are needed.

  • (not-big-deal) The current Go compiler is currently relatively inefficient about compiling large static byte slices. A Go file with just the dictionary alone takes 1s to compile on my machine.

My guess is, that the compiler would be able to cache the object files with a "go install". You could maybe even have the dictionaries in a separate package for that reason.

/Klaus

Andy Balholm

unread,
Oct 12, 2015, 1:04:45 PM10/12/15
to Klaus Post, golang-nuts
> Also, with a bit of time, we can also get an impression of how "safe" it is to use without SSL. Previous experience from trying other compression formats show problems could occur with proxies that can only handle deflate/gzip on plain HTTP.

Don’t forget that there are proxies that decrypt SSL. But hopefully a proxy smart enough to do that is also smart enough to modify the Accept-Encoding header on requests.

Mike Houston

unread,
Oct 27, 2015, 8:10:54 PM10/27/15
to golang-nuts, ma...@peereboom.us
I have put a cgo wrapper around the CompressBuffer/DecompressBuffer functions:

I have not wrapped the stream handling functions yet.

I've added the round-trip compression tests from the brotli repository, and all seems to work, but since my cgo experience was previously nil there may be some lurking issues. In particular I'm not sure how much buffer to allocate in some cases, so I have been a bit generous.

With cgo, is it appropriate to allocate a buffer in Go and pass it to the C decompression function to put its output in, or should I be allocating the buffer on the C-side and copying it to Go using GoBytes or reflect.SliceHeader? Or does it matter? Will the GC be blocked during the call into a C function?

Thoughts welcome.

Mike.


On Saturday, October 10, 2015 at 10:23:46 AM UTC+1, Klaus Post wrote:
Hi!

On Friday, 9 October 2015 16:38:27 UTC+2, Marco Peereboom wrote:
Anyone know of a golang implementation of brotli? 

Considering it has just been released, I wouldn't expect so. Maybe someone has wrapped the C-code.

<snip>

Ian Lance Taylor

unread,
Oct 27, 2015, 8:20:33 PM10/27/15
to Mike Houston, golang-nuts, Marco Peereboom
On Tue, Oct 27, 2015 at 5:01 PM, Mike Houston <schm...@gmail.com> wrote:
>
> With cgo, is it appropriate to allocate a buffer in Go and pass it to the C
> decompression function to put its output in, or should I be allocating the
> buffer on the C-side and copying it to Go using GoBytes or
> reflect.SliceHeader? Or does it matter? Will the GC be blocked during the
> call into a C function?

If we adopt the rules described in https://golang.org/issue/12416,
then it is fine to allocate a []byte in Go, pass the address of
element 0 of the slice to C, and let C code fill in the bytes.

The GC is not blocked during a call to a C function.

Never use reflect.SliceHeader. It was a mistake to make it an
exported type. As the docs say, it can not be used safely or
portably.

Ian

thebroke...@gmail.com

unread,
Oct 28, 2015, 2:09:49 AM10/28/15
to golang-nuts, schm...@gmail.com, ma...@peereboom.us
I finished a Brotli decoder in pure Go, you can find the package here:
After implementing this, I feel like Brotli is at least 2-3x more complicated than DEFLATE. At least I could keep the entire specification for DEFLATE in my head... I couldn't do it for Brotli. Currently, the library passes on all the test files that the official Brotli repo contains. The next steps for the package will be:
  1. Writing more unit tests to ensure correctness of the decoder. I want to hit as many edge conditions as possible and places where I was not sure what the specification was exactly dictating.
  2. Improve performance of the decoder.
  3. Start work on an encoder.
Enjoy,
JT

Mike Houston

unread,
Oct 28, 2015, 8:23:57 AM10/28/15
to golang-nuts, ma...@peereboom.us, thebroke...@gmail.com
Excellent, in which case I won't attempt to wrap the stream interface, as this seems like the better long-term solution.

Thanks for the clarification on passing pointers Ian, that is much clearer to me now.

Mike.

thebroke...@gmail.com

unread,
Oct 29, 2015, 12:08:58 AM10/29/15
to golang-nuts, ma...@peereboom.us, thebroke...@gmail.com
It may still be nice to have a stream interface on the writer. I don't anticipate getting the encoder logic done anytime soon.

The decoder logic is about 4000L of C, while the encoder logic is about 6200L of C++.
My decoder logic ended up being about 1500L of Go, who knows how many lines the encoder will end up being. Furthermore, I'll probably need to spend some time learning the encoder does its magic.

JT
Reply all
Reply to author
Forward
0 new messages