Intent to Ship: Brotli (Accept-encoding: br on HTTPS connection)

5,077 views
Skip to first unread message

Kenji Baheux

unread,
Jan 15, 2016, 3:46:37 AM1/15/16
to blink-dev, eus...@chromium.org, Jyrki Alakuijala, Zoltan Szabadka, Lode Vandevenne

bcc: net...@chromium.org


Contact emails

Engineering: eus...@chromium.org, {jyrki, szabadka, lode}@google.com

PM: kenji...@chromium.org


Spec

IETF draft


Summary

Brotli is used in WOFF 2.0 web fonts with great success.


This intent to ship is about making Brotli available as a Content-Encoding method, advertised via Accept-Encoding: br. 


Important note: Brotli availability is restricted to HTTPS connections.


Advantages:

  • Brotli outperforms gzip for typical web assets (e.g. css, html, js) by 17–25 %.

  • Brotli -11 density compared to gzip -9:

  • html (multi-language corpus): 25 % savings

  • js (alexa top 10k): 17 % savings

  • minified js (alexa top 10k): 17 % savings

  • css (alexa top 10k): 20 % savings


More details in the white paper.



Link to “Intent to Implement” blink-dev discussion

Intent to implement thread



Is this feature supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes.



Demo link

Brotli can be enabled in Chrome Canary via chrome://flags#enable-brotli


Here are a few live instances that we are aware of:

  • Google Fonts API serves responses (CSS) compressed with Brotli to supporting browsers (observed a weighted average of 9% savings across the top 20 css requests with Brotli -6).

  • CloudFlare serve their blog on an experimental HTTP2 server that also supports Brotli.



Debuggability

DevTools correctly shows the size (compressed, uncompressed) as well as the right value for the content-encoding header.



Interoperability and Compatibility Risk

Low risk:

  • The Brotli spec has been independently reviewed and implemented by Mark Adler.
  • Developer interest is high: interest from CDN vendors, tier1 web properties, third parties, nginx modules (cloudflare, google)...

  • Brotli has been in use in WOFF2 web fonts for a while with no compatibility issues.

  • Supporting Brotli for content-encoding is rather straightforward when you already support WOFF2

    • WOFF2 is supported in Chrome, Opera, Firefox

    • support for WOFF2 in Safari has recently landed (My read of webkit#150830)

    • support for WOFF2 is under consideration for Edge.

  • Brotli is supported by Firefox since M44 (also restricted to HTTPS connections)

  • Brotli is under consideration by the Edge team (pending pull request by Kyle Pflug, Edge program manager)

  • Brotli support in Safari: no public signals other than upcoming WOFF2 support (with Brotli under the hood)




OWP launch tracking bug

launch bug


Entry on the feature dashboard

https://www.chromestatus.com/feature/5420797577396224


Other: Brotli github repository

Yoav Weiss

unread,
Jan 15, 2016, 5:40:24 AM1/15/16
to Kenji Baheux, blink-dev, eus...@chromium.org, Jyrki Alakuijala, Zoltan Szabadka, Lode Vandevenne
Non-API-owner yay! \o/

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CADWWn7WzCP_EpuY7eEDWA4%2BNOP1bmyxPyQozGwaBOhmLfnoYkQ%40mail.gmail.com.

Jochen Eisinger

unread,
Jan 15, 2016, 5:48:32 AM1/15/16
to Yoav Weiss, Kenji Baheux, blink-dev, eus...@chromium.org, Jyrki Alakuijala, Zoltan Szabadka, Lode Vandevenne
lgtm

Chris Harrelson

unread,
Jan 15, 2016, 11:09:45 AM1/15/16
to Jochen Eisinger, Yoav Weiss, Kenji Baheux, blink-dev, eus...@chromium.org, Jyrki Alakuijala, Zoltan Szabadka, Lode Vandevenne
LGTM2

You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Jim Roskind

unread,
Jan 16, 2016, 12:22:45 AM1/16/16
to Kenji Baheux, blink-dev, eus...@chromium.org, Jyrki Alakuijala, Zoltan Szabadka, Lode Vandevenne
FYI: The reason why we removed bzip content encoding, which outperformed gzip, was because middle boxes routinely corrupted the data.  

The most extreme class of middle box included Vodaphone UK (for instance), that "tried to improve" on unknown content encodings, by doing the following (bizarre) things:

a) Remove the unrecognized content encoding
b) Pass the (already compressed!) content through another gzip encoding pass
c) Claim that the content was merely encoded as gzip <ugh!?!?!>

Some code that handled sdch encoding actually "gracefully" handles such molestation, by *expecting* to get "sdch,gzip" content, and when it doesn't arrive as expected, it tries to fallback decode as "sdch,gzip,gzip."  Note very pretty... but very functional.

Sadly, for bzip, there was no way of anticipating such molestation by middle boxes... so we yanked bzip... resolving complaints from tons of users.

Please anticipate (if history taught us anything) that a pile of customers will complain about "not being able to load content" from sites that use this new content encoding.

Jim



--

Jim Roskind

unread,
Jan 16, 2016, 12:25:25 AM1/16/16
to Kenji Baheux, blink-dev, eus...@chromium.org, Jyrki Alakuijala, Zoltan Szabadka, Lode Vandevenne
...but thinking some more.. I guess it is possible that SSL will "protect us" from such molestation...  but I'm not sure if some ISPs (for mobile?) do some SSL termination and "helpful fixups" before forwarding data to a mobile client that has "extra" certificates to facilitate the MITM activity.

YMMV,

Jim

Kenji Baheux

unread,
Jan 17, 2016, 8:21:46 PM1/17/16
to Jim Roskind, blink-dev, eus...@chromium.org, Jyrki Alakuijala, Zoltan Szabadka, Lode Vandevenne
Avoiding (at least mitigating) the "helpful fixups" was indeed another reason for limiting this to HTTPS.

Philip Jägenstedt

unread,
Jan 19, 2016, 10:51:26 AM1/19/16
to Kenji Baheux, Jim Roskind, blink-dev, eus...@chromium.org, Jyrki Alakuijala, Zoltan Szabadka, Lode Vandevenne
$ echo LGTM3 | bro.py | hexdump -C
00000000  8b 02 80 4c 47 54 4d 33  0a 03                    |...LGTM3..|
0000000a

(Unsurprisingly, it takes a longer input to actually save any bytes.)

You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

jms...@gmail.com

unread,
Jan 20, 2016, 6:26:55 PM1/20/16
to blink-dev, eus...@chromium.org, jy...@google.com, szab...@google.com, lo...@google.com

This intent to ship is about making Brotli available as a Content-Encoding method, advertised via Accept-Encoding: br. 


Important note: Brotli availability is restricted to HTTPS connections.



This makes no sense. Small/medium embedded devices used as HTTP servers, for which secure access is not a concern, should not be prohibited from using a good compression algorithm.

jms...@gmail.com

unread,
Jan 20, 2016, 6:31:44 PM1/20/16
to blink-dev, eus...@chromium.org, jy...@google.com, szab...@google.com, lo...@google.com, jms...@gmail.com

Brotli uses a pre-defined static dictionary of more than 13,000 strings to "warm up" its internal state

Oh -- never mind, that most likely precludes small/medium embedded devices. :-( 

Torne (Richard Coles)

unread,
Jan 20, 2016, 7:02:27 PM1/20/16
to jms...@gmail.com, blink-dev, eus...@chromium.org, jy...@google.com, lo...@google.com, szab...@google.com
As mentioned earlier in the thread one reason why this is limited to https is to stop it being mangled by proxies, which has been a practical problem in the past with encodings.

maria.de.je...@gmail.com

unread,
Apr 5, 2016, 11:46:42 AM4/5/16
to blink-dev, eus...@chromium.org, jy...@google.com, szab...@google.com, lo...@google.com


On Friday, January 15, 2016 at 2:46:37 AM UTC-6, Kenji Baheux wrote:

bcc: net...@chromium.org


Contact emails

Engineering: eus...@chromium.org, {jyrki, szabadka, lo...@google.com

Harald Alvestrand

unread,
Apr 5, 2016, 12:41:26 PM4/5/16
to maria.de.je...@gmail.com, blink-dev, eus...@chromium.org, Jyrki Alakuijala, szab...@google.com, Lode Vandevenne
Q: What is the percieved status of the brotli specification?

I note that the current draft is an internet-draft at version -08, and the "other implementation" contains version -02; the draft also contains a giant block (200K) of binary data, which is an odd form of specification, and hard to independently verify that it's correct (for whatever "correct" means in this context).

I'd like to ensure that we know that it's really well specified.


Torne (Richard Coles)

unread,
Apr 5, 2016, 3:08:42 PM4/5/16
to Harald Alvestrand, maria.de.je...@gmail.com, Jyrki Alakuijala, Lode Vandevenne, blink-dev, eus...@chromium.org, szab...@google.com
There isn't really anything unusual about a compression format containing a large block of binary data; it's a predefined dictionary. It is by definition correct: it's the dictionary from the spec, and so as long as all implementations use the dictionary from the spec they will interoperate.

Harald Alvestrand

unread,
Apr 5, 2016, 3:59:56 PM4/5/16
to Torne (Richard Coles), Maria Garza, Jyrki Alakuijala, Lode Vandevenne, blink-dev, Eugene Kliuchnikov, Zoltan Szabadka
The binary blob may be a dictionary, and it may be correct.
The presentation format makes it very hard for me to check either assumption.

Torne (Richard Coles)

unread,
Apr 5, 2016, 4:45:55 PM4/5/16
to Harald Alvestrand, Maria Garza, Jyrki Alakuijala, Lode Vandevenne, blink-dev, Eugene Kliuchnikov, Zoltan Szabadka

You can see that it is a dictionary by reading the algorithm to see how it uses it. How else would you expect it to be presented? It isn't going to be human readable in any form; it's composed of substrings of a giant corpus mixed together in no really meaningful order.

Harald Alvestrand

unread,
Apr 5, 2016, 5:00:59 PM4/5/16
to Torne (Richard Coles), Maria Garza, Jyrki Alakuijala, Lode Vandevenne, blink-dev, Eugene Kliuchnikov, Zoltan Szabadka
Flipside: If there's an error in the dictionary, how would you detect it?
Or even if someone had managed to embed malicious code inside it that was triggered by a particular bug in a particular implementation - how would you detect it?

I don't know what alternatives are reasonable (not having tried to reverse-engineer the dictionary from the algorithm); offhand, presenting the set of substrings and the algorithm to construct the dictionary from them seems like an obvious possibility.

Jyrki Alakuijala

unread,
Apr 6, 2016, 5:38:34 AM4/6/16
to jms...@gmail.com, blink-dev, eus...@chromium.org, Zoltan Szabadka, Lode Vandevenne
On Thu, Jan 21, 2016 at 12:31 AM, <jms...@gmail.com> wrote:

Brotli uses a pre-defined static dictionary of more than 13,000 strings to "warm up" its internal state

Oh -- never mind, that most likely precludes small/medium embedded devices. :-( 

Brotli's use of the 122 kB static dictionary is optional in the encoder -- but not for the decoder. The resource-wise great thing with the static dictionary is that it can be a shared resource for multiple processes or threads doing compression or decompression, whereas a dynamic dictionary (which brotli also uses) is always just a single-thread (or even worse, single compression context) resource use. It can be that the 122 kB static dictionary leads to an overall reduction of resource use in some cases. Of course, you wouldn't use it on a small 8-bit microcontroller device.

If you choose to use Brotli without the static dictionary, you will see a small drop in compression density for small text/web documents. brotli gets about 15 % of the improvements over zopfli (from a total of ~20 %) from improvements other than the static dictionary and the last 5 % from the static dictionary. There is no difference for binaries or for longer documents.

Jyrki Alakuijala

unread,
Apr 6, 2016, 8:46:57 AM4/6/16
to Harald Alvestrand, Torne (Richard Coles), Maria Garza, Lode Vandevenne, blink-dev, Eugene Kliuchnikov, Zoltan Szabadka
On Tue, Apr 5, 2016 at 11:00 PM, Harald Alvestrand <h...@google.com> wrote:
Flipside: If there's an error in the dictionary, how would you detect it?

https://datatracker.ietf.org/doc/draft-alakuijala-brotli/?include_text=1 Appendix A defines the dictionary together with its checksum.

For testing a decoder for correctness, we can synthesize a brotli stream that emits all dictionary entries with all transforms, and compare the output with a known good output. This is not a theoretical proposal, at least two independent brotli stream synthesizers exists today.

If you are curious and want to play with the dictionary, but don't want to convert the spec or the brotli implementation to bytes, http://www.gstatic.com/b/d is an easy access to the bytes of the brotli dictionary.

Brotli itself does not execute (as machine instructions) bytes from the dictionary or from the compressed or decompressed streams. For best protection an implementation of brotli would keep a copy of the dictionary in a non-execute and non-writeable area of the process memory. One such common possibility for an implementation is to keep the static dictionary in the data segment: https://en.wikipedia.org/wiki/Data_segment

Most of the non-ASCII-text bytes in the dictionary are fragments of UTF-8 encoded words in Chinese, Russian, Arabic, and Hindi. In addition to the UTF-8 encoded strings there are 24 binary words, all in the length 8 category -- a total of 192 bytes. I handpicked them based on a statistical analysis over a larger set of binary files, and pruned the handpicked set by what actually turned out useful for compression after comparing with vanilla LZ77 and entropy encoding. For your convenience, here are six example binary dictionary entries:
0, 0, 0, 0, 0, 0, 0, 0
0, 1, 2, 3, 4, 5, 6, 7
7, 6, 5, 4, 3, 2, 1, 0
8, 9, 10, 11, 12, 13, 14, 15
1, 0, 0, 0, 2, 0, 0, 0
0, 0, 255, 255, 0, 1, 0, 0

About the ordering of dictionary words: Within each length and group the ordering of words was defined with an iterative algorithm, where the highest compression gain words were added first. This ordering was chosen because it gives an additional compression gain over other orderings.

If you have more questions about brotli, I'm happy to try to answer them.
Reply all
Reply to author
Forward
0 new messages